In this document, we build a linear log odds model of probability of superiority judgments through a process of model expansion, where we will gradually add predictors to our model.

The LLO model follows from related work suggesting that the human perception of probability is encoded on a log odds scale. On this scale, the slope of a linear model represents the shape and severity of the function describing bias in probability perception. The greater the deviation of from a slope of 1 (i.e., ideal performance), the more biased the judgments of probability. Slopes less than one correspond to the kind of bias predicted by excessive attention to the mean. On the same log odds scale, the intercept is a crossover-point which should be proportional to the number of categories of possible outcomes among which probability is divided. In our case, the intercept should be about 0.5 since workers are judging the probability of a team getting more points with a new player than without.

Load and Prepare Data

We load worker responses from our experiment and do some preprocessing.

# read in data 
full_df <- read_csv("experiment-anonymous.csv")
## Parsed with column specification:
## cols(
##   .default = col_double(),
##   workerId = col_character(),
##   condition = col_character(),
##   start_means = col_logical(),
##   gender = col_character(),
##   age = col_character(),
##   education = col_character(),
##   chart_use = col_character(),
##   strategy_with_means = col_character(),
##   strategy_without_means = col_character(),
##   outcome = col_logical(),
##   trial = col_character(),
##   trialIdx = col_character()
## )
## See spec(...) for full column specifications.
# preprocessing
responses_df <- full_df %>%
  rename( # rename to convert away from camel case
    worker_id = workerId,
    ground_truth = groundTruth,
    sd_diff = sdDiff,
    p_award_with = pAwardWith,
    p_award_without = pAwardWithout,
    account_value = accountValue,
    p_superiority = pSup,
    start_time = startTime,
    resp_time = respTime,
    trial_dur = trialDur,
    trial_idx = trialIdx
  ) %>%
  # remove practice and mock trials from responses dataframe, leave in full version
  filter(trial_idx != "practice", trial_idx != "mock") %>% 
  # drop rows where p_superiority == NA for some reason
  drop_na(p_superiority) %>%
  # mutate rows where intervene == -1 for some reason
  mutate(
    intervene = if_else(intervene == -1,
                        # repair
                        if_else((payoff == (award_value - 1) | payoff == -1),
                                1, # payed for intervention
                                0), # didn't pay for intervention
                        # don't repair
                        as.numeric(intervene) # hack to avoid type error
                        )
  ) %>%
  # set up factors for modeling
  mutate(
    # add a variable to note whether the chart they viewed showed means
    means = as.factor((start_means & as.numeric(trial) <= (n_trials / 2)) | (!start_means & as.numeric(trial) > (n_trials / 2))),
    start_means = as.factor(start_means),
    sd_diff = as.factor(sd_diff),
    trial_number = as.numeric(trial)
  )

head(responses_df)
## # A tibble: 6 x 38
##   worker_id batch n_trials n_data_conds condition baseline es_threshold
##   <chr>     <dbl>    <dbl>        <dbl> <chr>        <dbl>        <dbl>
## 1 7819bfb6     17       34           18 intervals      0.5          0.9
## 2 7819bfb6     17       34           18 intervals      0.5          0.9
## 3 7819bfb6     17       34           18 intervals      0.5          0.9
## 4 7819bfb6     17       34           18 intervals      0.5          0.9
## 5 7819bfb6     17       34           18 intervals      0.5          0.9
## 6 7819bfb6     17       34           18 intervals      0.5          0.9
## # … with 31 more variables: start_means <fct>, award_value <dbl>,
## #   starting_value <dbl>, exchange <dbl>, cutoff <dbl>, max_bonus <dbl>,
## #   total_bonus <dbl>, duration <dbl>, numeracy <dbl>, gender <chr>, age <chr>,
## #   education <chr>, chart_use <chr>, strategy_with_means <chr>,
## #   strategy_without_means <chr>, account_value <dbl>, ground_truth <dbl>,
## #   intervene <dbl>, outcome <lgl>, p_award_with <dbl>, p_award_without <dbl>,
## #   p_superiority <dbl>, payoff <dbl>, resp_time <dbl>, sd_diff <fct>,
## #   start_time <dbl>, trial <chr>, trial_dur <dbl>, trial_idx <chr>,
## #   means <fct>, trial_number <dbl>

We need the data in a format where it is prepared for modeling. We censor responses to the range 0.5% to 99.5% where responses at these bounds reflect an intended response at the bound or higher. By rounding responses to the nearest 0.5%, we assume that the response scale has a resolution of 1% in practice. We need to do this to avoid values of positive or negative infinity when we transform responses to a log odds scale. We convert both probability of superiority judgments and the ground truth to a logit scale.

# create data frame for model
model_df <- responses_df %>%
  mutate( 
    # recode responses greater than 99.5% and less than 0.5% to avoid values of +/- Inf on a logit scale
    p_superiority = if_else(p_superiority > 99.5, 
                            99.5,
                            if_else(p_superiority < 0.5,
                                    0.5,
                                    as.numeric(p_superiority))),
    # apply logit function to p_sup judgments and ground truth
    lo_p_sup = qlogis(p_superiority / 100),
    lo_ground_truth = qlogis(ground_truth),
    # # scale and center lo_ground_truth
    # clo_ground_truth = (lo_ground_truth - mean(lo_ground_truth)) / (max(lo_ground_truth) - min(lo_ground_truth)),
    # scale and center trial order
    trial = (trial_number - as.numeric(n_trials) / 2) / as.numeric(n_trials)
  )

Now, lets apply our exclusion criteria, cutting our sample down to only the subset of participants who passed both attention checks.

# determine exclusions
exclude_df <- model_df %>% 
  # attention check trials where ground truth = c(0.5, 0.999)
  mutate(failed_check = (ground_truth == 0.5 & intervene != 0) | (ground_truth == 0.999 & intervene != 1)) %>%
  group_by(worker_id) %>%
  summarise(
    failed_attention_checks = sum(failed_check),
    unique_p_sup = length(unique(p_superiority)),
    # excluded if they failed either attention check or used fewer than three levels of the response scale
    exclude = failed_attention_checks > 0 | unique_p_sup < 3
  ) %>% 
  dplyr::select(worker_id, exclude)

# apply exclusion criteria and remove attention check trials from modeling data set
model_df <- model_df %>% 
  left_join(exclude_df, by = "worker_id") %>% 
  filter(exclude == FALSE) %>%
  filter(ground_truth > 0.5 & ground_truth < 0.999)

# how many remaining workers per condition?
model_df %>%
  group_by(condition, start_means) %>% # between subject manipulations
  summarise(
    n_workers = length(unique(worker_id))
  )
## # A tibble: 8 x 3
## # Groups:   condition [4]
##   condition start_means n_workers
##   <chr>     <fct>           <int>
## 1 densities FALSE              79
## 2 densities TRUE               78
## 3 HOPs      FALSE              79
## 4 HOPs      TRUE               76
## 5 intervals FALSE              80
## 6 intervals TRUE               80
## 7 QDPs      FALSE              77
## 8 QDPs      TRUE               77

In addition to excluding participants who failed at least one of the two attention checks in the experiment, which is our preregistered exclusion criterion, we also exclude a handful of workers whose data lead to model fit issues. These are workers who responded with only one or two levels of the probability of superiority scale. We could make the case that these workers might not have been trying very hard when responding, but the reason for excluding them is much more practical: It is not possible for the modeling process we are using to estimate random effects on response variability for these participants (i.e., you cannot calculate the variance of a set with only one or two distinct values). These random effects on variance are very important, because our data almost certaintly violate a homogeneity of variance assumption.

Because of these exclusions, we are a few participants short of our target samples size of 80. We should still have more than enough data to support statistical inferences. Here we drop a handful of additional participants to maintain counterbalancing of block order. Since we know that there were some participants with dropped responses, let’s prioritize leaving out workers with the greatest number of dropped trials in each counterbalancing condition.

model_df %>%
  group_by(condition, start_means, worker_id) %>%
  summarise(
    n_trials = n(),
    dropped_trials = 32 - n_trials
  ) %>%
  filter(dropped_trials > 0)
## # A tibble: 5 x 5
## # Groups:   condition, start_means [3]
##   condition start_means worker_id n_trials dropped_trials
##   <chr>     <fct>       <chr>        <int>          <dbl>
## 1 densities TRUE        e4b46997        24              8
## 2 HOPs      FALSE       c488db75         5             27
## 3 HOPs      FALSE       ce016e09        25              7
## 4 HOPs      FALSE       f430e2e8        28              4
## 5 intervals FALSE       ff8a2a69        28              4

Based on a comparison of the two tables above, we’ll drop workers c488db75, ce016e09, and f430e2e8 to ensure our ability to fit our model.

# remove workers with missing data, plus one where condition = densities, start_means = FALSE
model_df <- model_df %>%
filter(!worker_id %in% c("c488db75", "ce016e09", "f430e2e8")) # also exclude "c337674a" to counterbalance, but would take a long time to rerun

model_df %>%
  group_by(condition, start_means) %>% # between subject manipulations
  summarise(
    n_workers = length(unique(worker_id))
  )
## # A tibble: 8 x 3
## # Groups:   condition [4]
##   condition start_means n_workers
##   <chr>     <fct>           <int>
## 1 densities FALSE              79
## 2 densities TRUE               78
## 3 HOPs      FALSE              76
## 4 HOPs      TRUE               76
## 5 intervals FALSE              80
## 6 intervals TRUE               80
## 7 QDPs      FALSE              77
## 8 QDPs      TRUE               77

Now we have our dataset ready for modeling.

Distribution of Probability of Superiority Judgments

We start as simply as possible by just modeling the distribution of probability of superiority judgements on the log odds scale.

Before we fit the model to our data, let’s check that our priors seem reasonable. We’ll use a weakly informative prior for the intercept parameter since we want the population-level centered intercept to be flexible. We set the expected value of the prior on the intercept equal to the mean value of the ground truth that we sampled (in log odds units).

# get mean value of ground truth sampled in log odds units
model_df %>% select(lo_ground_truth) %>% summarize(mean = mean(lo_ground_truth))
## # A tibble: 1 x 1
##    mean
##   <dbl>
## 1  1.30
# get_prior(data = model_df, family = "gaussian", formula = lo_p_sup ~ 1)

# starting as simple as possible: learn the distribution of lo_p_sup
prior.lo_p_sup <- brm(data = model_df, family = "gaussian",
              lo_p_sup ~ 1,
              prior = c(prior(normal(1.3, 1), class = Intercept),
                        prior(normal(0, 1), class = sigma)),
              sample_prior = "only",
              iter = 3000, warmup = 500, chains = 2, cores = 2)
## Compiling the C++ model
## Start sampling
## Warning: There were 3 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: Examine the pairs() plot to diagnose sampling problems

Let’s look at our prior predictive distribution. For this intercept model, it should be skewwed left because we have located our prior near 74% probability of superiority. We should see a peak near the upper bound of the probability scale.

# prior predictive check
model_df %>%
  select() %>%
  add_predicted_draws(prior.lo_p_sup, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
  mutate(
    # transform to probability units
    prior_p_sup = plogis(lo_p_sup)
  ) %>%
  ggplot(aes(x = prior_p_sup)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Prior predictive distribution for probability of superiority") +
  theme(panel.grid = element_blank())

Now, let’s fit the model to data. This is just trying to estimate the mean response regardless of the ground truth.

# starting as simple as possible: learn the distribution of lo_p_sup
m.lo_p_sup <- brm(data = model_df, family = "gaussian",
              lo_p_sup ~ 1,
              prior = c(prior(normal(1.3, 1), class = Intercept),
                        prior(normal(0, 1), class = sigma)),
              iter = 3000, warmup = 500, chains = 2, cores = 2,
              file = "model-fits/lo_mdl")

Check diagnostics:

# trace plots
plot(m.lo_p_sup)

# pairs plot
pairs(m.lo_p_sup)

# model summary
print(m.lo_p_sup)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: lo_p_sup ~ 1 
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 3000; warmup = 500; thin = 1;
##          total post-warmup samples = 5000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.57      0.01     0.55     0.59 1.00     2957     3127
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.29      0.01     1.28     1.30 1.00     5269     3285
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s check our posterior predictive distribution.

# posterior predictive check
model_df %>%
  select() %>%
  add_predicted_draws(m.lo_p_sup, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
  mutate(
    # transform to probability units
    post_p_sup = plogis(lo_p_sup)
    ) %>%
  ggplot(aes(x = post_p_sup)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior predictive distribution for probability of superiority",
       post_p_sup = NULL) +
  theme(panel.grid = element_blank())

How do these predictions compare to the observed data?

# data density
model_df %>%
  ggplot(aes(x = p_superiority)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Data distribution for probability of superiority") +
  theme(panel.grid = element_blank())

Our model is not sensitive to the ground truth, so we expect to see a mismatch here.

Linear Log Odds Model of Probability of Superiority

Now well add in a slope parameter to make our model sensitive to the ground truth. This is the simplest version of our linear log odds (LLO) model.

Before we fit the model to our data, let’s check that our priors seem reasonable. Since we are now including a slope parameter for the ground truth in our model, we can dial down the width of our prior for sigma (i.e., residual variance) to avoid over-dispersion of predicted responses.

# get_prior(data = model_df, family = "gaussian", formula = lo_p_sup ~ lo_ground_truth)

# simple LLO model
prior.llo <- brm(data = model_df, family = "gaussian",
                 lo_p_sup ~ lo_ground_truth,
                 prior = c(prior(normal(1, 0.5), class = b),
                           prior(normal(1.3, 1), class = Intercept),
                           prior(normal(0, 0.5), class = sigma)),
                 sample_prior = "only",
                 iter = 3000, warmup = 500, chains = 2, cores = 2)
## Compiling the C++ model
## Start sampling
## Warning: There were 2 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: Examine the pairs() plot to diagnose sampling problems

Let’s look at our prior predictive distribution. For this linear model, we should see density spread slightly more evenly across probability values.

# prior predictive check
model_df %>%
  select(lo_ground_truth) %>%
  add_predicted_draws(prior.llo, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
  mutate(
    # transform to probability units
    prior_p_sup = plogis(lo_p_sup)
    ) %>%
  ggplot(aes(x = prior_p_sup)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Prior predictive distribution for probability of superiority") +
  theme(panel.grid = element_blank())

Now let’s fit the model to data.

# simple LLO model
m.llo <- brm(data = model_df, family = "gaussian",
             lo_p_sup ~ lo_ground_truth,
             prior = c(prior(normal(1, 0.5), class = b),
                       prior(normal(1.3, 1), class = Intercept),
                       prior(normal(0, 0.5), class = sigma)),
             iter = 3000, warmup = 500, chains = 2, cores = 2,
             file = "model-fits/llo_mdl")

Check diagnostics:

# trace plots
plot(m.llo)

# pairs plot
pairs(m.llo)

Our slope and intercept parameters seem pretty highly correlated. Maybe adding hierarchy to our model will remedy this.

# model summary
print(m.llo)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: lo_p_sup ~ lo_ground_truth 
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 3000; warmup = 500; thin = 1;
##          total post-warmup samples = 5000
## 
## Population-Level Effects: 
##                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept          -0.13      0.02    -0.16    -0.10 1.00     3695     3646
## lo_ground_truth     0.54      0.01     0.52     0.56 1.00     3685     3739
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.20      0.01     1.18     1.21 1.00     6959     4010
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s check our posterior predictive distribution.

# posterior predictive check
model_df %>%
  select(lo_ground_truth) %>%
  add_predicted_draws(m.llo, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
  mutate(
    # transform to probability units
    post_p_sup = plogis(lo_p_sup)
  ) %>%
  ggplot(aes(x = post_p_sup)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior predictive distribution for probability of superiority") +
  theme(panel.grid = element_blank())

How do these predictions compare to the observed data?

# data density
model_df %>%
  ggplot(aes(x = p_superiority)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Data distribution for probability of superiority") +
  theme(panel.grid = element_blank())

Our model is now sensitive to the ground truth, but it is still having trouble fitting the data. It may be that the model is not capturing the individual variability in response patterns. Next we’ll add hierarchy to our model.

Add Hierarchy for Slope, Intercepts, and Sigma

The models we’ve created thus far fail to account for much of the variability in the data. Here, we attempt to parse some heterogeniety in responses by modeling a random effect of worker on slopes, intercepts, and residual variance. This introduces a hierarchical component to our model in order to account for individual differences in the best fitting linear model for each worker’s data.

Before we fit the model to our data, let’s check that our priors seem reasonable. We are adding hyperpriors for the standard deviation of slopes, intercepts, and residual variation (i.e., sigma) per worker, as well as the correlation between them. We’ll set moderately wide priors on these worker-level slope and intercept effects. We want some regularization, but we don’t want to overregularize potentially large individual variability, which is sort of a tough balance. We’ll also narrow the priors on sigma parameters since we are now attributing variability to more sources and we want to avoid overdispersion. We’ll set a prior on the correlation between slopes and intercepts per worker that avoids large absolute correlations.

# get_prior(data = model_df, family = "gaussian", formula = bf(lo_p_sup ~ (1 + lo_ground_truth|sharecor|worker_id) + lo_ground_truth, sigma ~ (1|sharecor|worker_id)))

# hierarchical LLO model
prior.wrkr.llo <- brm(data = model_df, family = "gaussian",
                      formula = bf(lo_p_sup ~ (1 + lo_ground_truth|sharecor|worker_id) + lo_ground_truth, 
                                   sigma ~ (1|sharecor|worker_id)),
                      prior = c(prior(normal(1, 0.5), class = b),
                                prior(normal(1.3, 1), class = Intercept),
                                prior(normal(0, 0.15), class = sd, group = worker_id),
                                prior(normal(0, 0.15), class = sd, dpar = sigma),
                                prior(lkj(4), class = cor)),
                      sample_prior = "only",
                      iter = 3000, warmup = 500, chains = 2, cores = 2)
## Compiling the C++ model
## Start sampling
## Warning: There were 5 divergent transitions after warmup. Increasing adapt_delta above 0.8 may help. See
## http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
## Warning: Examine the pairs() plot to diagnose sampling problems

Let’s look at our prior predictive distribution. Because this model contains so many more sources of variation, the prior predictive distribution may look a little overdispersed (i.e., lots of mass at the boundaries of the response scale). However, it’s probably best to err on the side of not making our priors on individual parameters too narrow.

# prior predictive check
model_df %>%
  select(lo_ground_truth, worker_id) %>%
  add_predicted_draws(prior.wrkr.llo, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
  mutate(
    # transform to probability units
    prior_p_sup = plogis(lo_p_sup)
    ) %>%
  ggplot(aes(x = prior_p_sup)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Prior predictive distribution for probability of superiority") +
  theme(panel.grid = element_blank())

Now, let’s fit the model to our data.

# hierarchical LLO model
m.wrkr.llo <- brm(data = model_df, family = "gaussian",
                  formula = bf(lo_p_sup ~ (1 + lo_ground_truth|sharecor|worker_id) + lo_ground_truth,  
                               sigma ~ (1|sharecor|worker_id)), 
                  prior = c(prior(normal(1, 0.5), class = b),
                            prior(normal(1.3, 1), class = Intercept),
                            prior(normal(0, 0.15), class = sd, group = worker_id),
                            prior(normal(0, 0.15), class = sd, dpar = sigma),
                            prior(lkj(4), class = cor)),
                  iter = 3000, warmup = 500, chains = 2, cores = 2,
                  control = list(adapt_delta = 0.99, max_treedepth = 12),
                  file = "model-fits/llo_mdl-wrkr")

Check diagnostics:

# trace plots
plot(m.wrkr.llo)

# pairs plot (fixed effects)
pairs(m.wrkr.llo, exact_match = TRUE, pars = c("b_Intercept", "b_lo_ground_truth", "b_sigma_Intercept"))

# pairs plot (random effects)
pairs(m.wrkr.llo, pars = c("sd_worker_id__", "cor_worker_id__"))

# model summary
print(m.wrkr.llo)
##  Family: gaussian 
##   Links: mu = identity; sigma = log 
## Formula: lo_p_sup ~ (1 + lo_ground_truth | sharecor | worker_id) + lo_ground_truth 
##          sigma ~ (1 | sharecor | worker_id)
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 3000; warmup = 500; thin = 1;
##          total post-warmup samples = 5000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 623) 
##                                      Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(Intercept)                            0.42      0.03     0.37     0.47 1.00
## sd(lo_ground_truth)                      0.45      0.01     0.43     0.48 1.00
## sd(sigma_Intercept)                      0.77      0.02     0.73     0.81 1.00
## cor(Intercept,lo_ground_truth)          -0.24      0.05    -0.33    -0.15 1.00
## cor(Intercept,sigma_Intercept)          -0.47      0.05    -0.56    -0.38 1.00
## cor(lo_ground_truth,sigma_Intercept)     0.58      0.03     0.52     0.64 1.00
##                                      Bulk_ESS Tail_ESS
## sd(Intercept)                             542      906
## sd(lo_ground_truth)                       433      651
## sd(sigma_Intercept)                      1398     2564
## cor(Intercept,lo_ground_truth)            335      894
## cor(Intercept,sigma_Intercept)            442      919
## cor(lo_ground_truth,sigma_Intercept)     1487     2396
## 
## Population-Level Effects: 
##                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept          -0.15      0.02    -0.19    -0.11 1.00      542     1018
## sigma_Intercept    -0.73      0.03    -0.79    -0.66 1.00      618     1239
## lo_ground_truth     0.55      0.02     0.51     0.58 1.01      208      565
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s check our posterior predictive distribution.

# posterior predictive check
model_df %>%
  select(lo_ground_truth, worker_id) %>%
  add_predicted_draws(m.wrkr.llo, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
  mutate(
    # transform to probability units
    post_p_sup = plogis(lo_p_sup)
    ) %>%
  ggplot(aes(x = post_p_sup)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior predictive distribution for probability of superiority") +
  theme(panel.grid = element_blank())

How do these predictions compare to the observed data?

# data density
model_df %>%
  ggplot(aes(x = p_superiority)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Data distribution for probability of superiority") +
  theme(panel.grid = element_blank())

Running a leave one out posterior predictive check, we can see that overall this model has decent predictive validity.

# set up data for LOO posterior predictive check
y <- model_df$lo_p_sup
yrep <- posterior_predict(m.wrkr.llo)

# run LOO to get weights
loo <- loo(m.wrkr.llo, save_psis = TRUE, cores = 2)
## Warning: Found 183 observations with a pareto_k > 0.7 in model 'm.wrkr.llo'.
## With this many problematic observations, it may be more appropriate to use
## 'kfold' with argument 'K = 10' to perform 10-fold cross-validation rather than
## LOO.
psis <- loo$psis_object
lw <- weights(psis)
ppc_loo_pit_qq(y, yrep, lw = lw)

Let’s look at posterior predictions per worker to get a more detailed sense of fit quality. When we make this kind of plot for model checks at the level of individual workers, we’ll look at a subset of workers to keep the number of charts generated to a reasonable number.

# two workers from each counterbalancing condition
model_check_set <- model_df %>% 
  group_by(start_means, condition, worker_id) %>%
  summarise() %>%
  top_n(2)
## Selecting by worker_id
model_check_set <- model_check_set$worker_id
model_check_df <- model_df %>%
  filter(worker_id %in% model_check_set)

model_check_df %>% 
  group_by(worker_id) %>%
  summarise()
## # A tibble: 16 x 1
##    worker_id
##    <chr>    
##  1 f27ed3b6 
##  2 f4f534e0 
##  3 f5d48035 
##  4 f796f54d 
##  5 f7f69f44 
##  6 f83e2827 
##  7 fa0f4b94 
##  8 fa22b8bb 
##  9 fba3405d 
## 10 fccb21d5 
## 11 fd15ec30 
## 12 fd3bea1b 
## 13 fdb8555e 
## 14 fe8936cd 
## 15 fee45dce 
## 16 ff8a2a69
model_check_df %>%
  # get posterior predictive distribution
  group_by(lo_ground_truth, worker_id) %>%
  add_predicted_draws(m.wrkr.llo, n = 500) %>%
  # plot
  ggplot(aes(x = lo_ground_truth, y = lo_p_sup, color = condition, fill = condition)) +
  geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
  stat_lineribbon(aes(y = .prediction), .width = c(.95, .80, .50), alpha = .25) +
  geom_point(data = model_check_df) +
  scale_fill_brewer(type = "qual", palette = 2) +
  scale_color_brewer(type = "qual", palette = 2) + 
  coord_cartesian(xlim = quantile(model_df$lo_ground_truth, c(0, 1)),
                  ylim = quantile(model_df$lo_p_sup, c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_wrap(~ worker_id)

What does this look like in probability units?

model_check_df %>%
  # get posterior predictive distribution
  group_by(lo_ground_truth, worker_id) %>%
  add_predicted_draws(m.wrkr.llo, n = 500) %>%
  # plot
  ggplot(aes(x = plogis(lo_ground_truth), y = plogis(lo_p_sup), color = condition, fill = condition)) +
  geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
  stat_lineribbon(aes(y = plogis(.prediction)), .width = c(.95, .80, .50), alpha = .25) +
  geom_point(data = model_check_df) +
  scale_fill_brewer(type = "qual", palette = 2) +
  scale_color_brewer(type = "qual", palette = 2) + 
  coord_cartesian(xlim = quantile(plogis(model_df$lo_ground_truth), c(0, 1)),
                  ylim = quantile(plogis(model_df$lo_p_sup), c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_wrap(~ worker_id)

One thing we’re trying to gage here is whether our model has predictive validity at the level of each worker. To examine this more closely we’ll look at QQ plots for residuals at the worker level.

model_check_df %>%
  # get posterior draws and transform
  add_predicted_draws(m.wrkr.llo, n = 500) %>%
  group_by(lo_ground_truth, worker_id) %>%
  summarise(
    p_residual = mean(.prediction < lo_p_sup), # what proportion of predicted judgments are less than the observed response?
    z_residual = qnorm(p_residual)             # what are the z-scores of these cumulative probabilities?
  ) %>%
  # plot
  ggplot(aes(sample = z_residual)) +
  geom_qq() +
  geom_abline() +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_wrap(~ worker_id)

These don’t look great. We can see that there is some clustering of responses, probably reflecting a preference for round numbers on the response scale.

pp_check(m.wrkr.llo)
## Using 10 posterior samples for ppc type 'dens_overlay' by default.

As long as the location and scale of the predictions look reasonably in line with the empirical data (which they do), we don’t really care too much if the model doesn’t predict every small anomally. This plot showing predictive densities alongside the observed data is resassuring insofar as we are doing a decent job of modeling the things we care about.

Let’s see if our predictive validity improves at the worker level when we add our experimental manipulations as predictors.

Add Predictors to Answer Research Questions

In order to answer our research questions, we need to account for the interaction of the ground truth with whether means are present vs absent, whether visualized uncertainty is high vs low, and what uncertainty visualization condition a user was assigned to. We’ll add predictors for each of these factors to our hierarchical model in turn.

Presence/Absence of the Mean

Our primary research question is how the presence of the mean impacts the slopes of linear models in log odds space. To test this, we’ll add an interaction between the presence of the mean and the ground truth.

We use the same priors as we did for the previous model. Now, let’s fit the model to our data.

# hierarchical LLO model with fixed effects on slope and residual variance conditioned on the presence/absence of the mean
m.wrkr.means.llo <- brm(data = model_df, family = "gaussian",
                        formula = bf(lo_p_sup ~  (1 + lo_ground_truth|sharecor|worker_id) + lo_ground_truth*means,
                                     sigma ~ (1|sharecor|worker_id)),
                        prior = c(prior(normal(1, 0.5), class = b),
                                  prior(normal(1.3, 1), class = Intercept),
                                  prior(normal(0, 0.15), class = sd, group = worker_id),
                                  prior(normal(0, 0.15), class = sd, dpar = sigma),
                                  prior(lkj(4), class = cor)),
                        iter = 3000, warmup = 500, chains = 2, cores = 2,
                        control = list(adapt_delta = 0.99, max_treedepth = 12),
                        file = "model-fits/llo_mdl-wrkr_means")

Check diagnostics:

  • Trace plots
# trace plots
plot(m.wrkr.means.llo)

  • Pairs plot
# pairs plot (fixed effects)
pairs(m.wrkr.means.llo, exact_match = TRUE, pars = c("b_Intercept", 
                                                           "b_lo_ground_truth",
                                                           "b_meansTRUE",
                                                           "b_lo_ground_truth:meansTRUE",
                                                           "b_sigma_Intercept"))

# pairs plot (random effects)
pairs(m.wrkr.means.llo, exact_match = TRUE, pars = c("sd_worker_id__Intercept", 
                                                     "sd_worker_id__lo_ground_truth",
                                                     "sd_worker_id__sigma_Intercept"))

# pairs plot (covariance matrix)
pairs(m.wrkr.means.llo, pars = c("cor_worker_id__"))

  • Summary
# model summary
print(m.wrkr.means.llo)
##  Family: gaussian 
##   Links: mu = identity; sigma = log 
## Formula: lo_p_sup ~ (1 + lo_ground_truth | sharecor | worker_id) + lo_ground_truth * means 
##          sigma ~ (1 | sharecor | worker_id)
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 3000; warmup = 500; thin = 1;
##          total post-warmup samples = 5000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 623) 
##                                      Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(Intercept)                            0.42      0.03     0.37     0.47 1.00
## sd(lo_ground_truth)                      0.46      0.01     0.43     0.48 1.00
## sd(sigma_Intercept)                      0.77      0.02     0.73     0.81 1.00
## cor(Intercept,lo_ground_truth)          -0.25      0.05    -0.33    -0.15 1.00
## cor(Intercept,sigma_Intercept)          -0.47      0.04    -0.56    -0.38 1.00
## cor(lo_ground_truth,sigma_Intercept)     0.58      0.03     0.52     0.64 1.00
##                                      Bulk_ESS Tail_ESS
## sd(Intercept)                             763     1288
## sd(lo_ground_truth)                       579     1862
## sd(sigma_Intercept)                      1377     2391
## cor(Intercept,lo_ground_truth)            732     1837
## cor(Intercept,sigma_Intercept)            799     1594
## cor(lo_ground_truth,sigma_Intercept)     1187     2528
## 
## Population-Level Effects: 
##                           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS
## Intercept                    -0.15      0.02    -0.19    -0.11 1.00      712
## sigma_Intercept              -0.73      0.03    -0.79    -0.67 1.00      845
## lo_ground_truth               0.54      0.02     0.51     0.58 1.01      392
## meansTRUE                     0.01      0.01    -0.01     0.02 1.00     9002
## lo_ground_truth:meansTRUE     0.00      0.00    -0.01     0.01 1.00     8119
##                           Tail_ESS
## Intercept                     2011
## sigma_Intercept               1171
## lo_ground_truth               1283
## meansTRUE                     3749
## lo_ground_truth:meansTRUE     3434
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s check our posterior predictive distribution.

# posterior predictive check
model_df %>%
  select(lo_ground_truth, worker_id, means) %>%
  add_predicted_draws(m.wrkr.means.llo, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
  mutate(
    # transform to probability units
    post_p_sup = plogis(lo_p_sup)
  ) %>%
  ggplot(aes(x = post_p_sup)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior predictive distribution for probability of superiority") +
  theme(panel.grid = element_blank())

How do these predictions compare to the observed data?

# data density
model_df %>%
  ggplot(aes(x = p_superiority)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Data distribution for probability of superiority") +
  theme(panel.grid = element_blank())

Running a leave one out posterior predictive check, we can see that overall this model has decent predictive validity.

# set up data for LOO posterior predictive check
y <- model_df$lo_p_sup
yrep <- posterior_predict(m.wrkr.means.llo)

# run LOO to get weights
loo <- loo(m.wrkr.means.llo, save_psis = TRUE, cores = 2)
## Warning: Found 176 observations with a pareto_k > 0.7 in model
## 'm.wrkr.means.llo'. With this many problematic observations, it may be more
## appropriate to use 'kfold' with argument 'K = 10' to perform 10-fold cross-
## validation rather than LOO.
psis <- loo$psis_object
lw <- weights(psis)
ppc_loo_pit_qq(y, yrep, lw = lw)

Let’s take a look at predictions per worker and visualization condition to get a more granular sense of our model fit.

model_check_df %>%
  group_by(lo_ground_truth, worker_id, means) %>%
  add_predicted_draws(m.wrkr.means.llo, n = 500) %>%
  ggplot(aes(x = lo_ground_truth, y = lo_p_sup, color = condition, fill = condition)) +
  geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
  stat_lineribbon(aes(y = .prediction), .width = c(.95, .80, .50), alpha = .25) +
  geom_point(data = model_check_df) +
  scale_fill_brewer(type = "qual", palette = 2) +
  scale_color_brewer(type = "qual", palette = 2) + 
  coord_cartesian(xlim = quantile(model_df$lo_ground_truth, c(0, 1)),
                  ylim = quantile(model_df$lo_p_sup, c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_wrap(~ worker_id)

What does this look like in probability units?

model_check_df %>%
  group_by(lo_ground_truth, worker_id, means) %>%
  add_predicted_draws(m.wrkr.means.llo, n = 500) %>%
  ggplot(aes(x = plogis(lo_ground_truth), y = plogis(lo_p_sup), color = condition, fill = condition)) +
  geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
  stat_lineribbon(aes(y = plogis(.prediction)), .width = c(.95, .80, .50), alpha = .25) +
  geom_point(data = model_check_df) +
  scale_fill_brewer(type = "qual", palette = 2) +
  scale_color_brewer(type = "qual", palette = 2) + 
  coord_cartesian(xlim = quantile(plogis(model_df$lo_ground_truth), c(0, 1)),
                  ylim = quantile(plogis(model_df$lo_p_sup), c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_wrap(~ worker_id)

To examine more closely whether our model has predictive validity at the level of each worker, we’ll look at QQ plots for residuals at the worker level.

model_check_df %>%
  add_predicted_draws(m.wrkr.means.llo, n = 500) %>%
  group_by(lo_ground_truth, worker_id) %>%
  summarise(
    p_residual = mean(.prediction < lo_p_sup), # what proportion of predicted judgments are less than the observed response?
    z_residual = qnorm(p_residual)             # what are the z-scores of these cumulative probabilities?
  ) %>%
  ggplot(aes(sample = z_residual)) +
  geom_qq() +
  geom_abline() +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_wrap(~ worker_id)

These still look pretty terrible.

With this model we can take a first stab at addressing our research question about the presence of extrinsic means. What does the posterior for the slope of the LLO model look like when means are present vs absent, ignoring other manipulations for now? Since we are building a complex model, we’ll forego calculating maringal effects by manually combining parameters. Instead we’ll use add_fitted_draws and compare_levels from tidybayes to get our slopes, and then we’ll take their weighted average grouping by the parameters for which we want marginal effects.

model_df %>%
  group_by(means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%          # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.wrkr.means.llo, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%  # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(means, .draw) %>%                        # group by predictors to keep
  summarise(slope = weighted.mean(slope)) %>%       # marginalize out visualization condition by taking a weighted average
  ggplot(aes(x = slope, group = means, color = means, fill = means)) +
  geom_density(alpha = 0.35) +
  scale_x_continuous(expression(slope), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior for slopes for mean present/absent") +
  theme(panel.grid = element_blank())

Recall that a slope of 1 represents no bias. This chart suggests that people are biased with or without adding means. We should not be surprised to see little to no effect in this model. The mean difference is a good heuristic for probability of superiority when variance of visualized estimates is high, but it is not a good heuristic when variance is low. Thus, we should expect to see the effect we are looking for as an interaction between the presence of the mean and the level of uncertainty.

Level of Uncertainty Shown

Another factor that we manipulate is the level of uncertainty presented to chart users. We expect level of uncertainty (sd_diff) to determine the impact of extrinsic means on performance. To test this, we’ll add an interaction between sd_diff, means, and the ground truth.

We use the same priors as we did for the previous model. Now, let’s fit the model to our data.

# hierarchical LLO model
m.wrkr.means.sd.llo <- brm(data = model_df, family = "gaussian",
                           formula = bf(lo_p_sup ~  (1 + lo_ground_truth|sharecor|worker_id) + lo_ground_truth*means*sd_diff,
                                        sigma ~ (1|sharecor|worker_id)),
                           prior = c(prior(normal(1, 0.5), class = b),
                                     prior(normal(1.3, 1), class = Intercept),
                                     prior(normal(0, 0.15), class = sd, group = worker_id),
                                     # prior(normal(0, 0.3), class = b, dpar = sigma),
                                     prior(normal(0, 0.15), class = sd, dpar = sigma),
                                     prior(lkj(4), class = cor)),
                           iter = 3000, warmup = 500, chains = 2, cores = 2,
                           control = list(adapt_delta = 0.99, max_treedepth = 12),
                           file = "model-fits/llo_mdl-wrkr_means_sd")

Check diagnostics:

  • Trace plots
# trace plots
plot(m.wrkr.means.sd.llo)

  • Pairs plot
# pairs plot (LLO params)
pairs(m.wrkr.means.sd.llo, exact_match = TRUE, pars = c("b_Intercept", 
                                                        "b_lo_ground_truth",
                                                        "b_meansTRUE",
                                                        "b_sd_diff15",
                                                        "b_lo_ground_truth:meansTRUE",
                                                        "b_lo_ground_truth:sd_diff15",
                                                        "b_meansTRUE:sd_diff15",
                                                        "b_lo_ground_truth:meansTRUE:sd_diff15"))

# pairs plot (random effects on lo_p_sup)
pairs(m.wrkr.means.sd.llo, exact_match = TRUE, pars = c("sd_worker_id__Intercept", 
                                                        "sd_worker_id__lo_ground_truth"))

# pairs plot (sigma params)
pairs(m.wrkr.means.sd.llo, exact_match = TRUE, pars = c("b_sigma_Intercept", 
                                                        "sd_worker_id__sigma_Intercept"))

pairs(m.wrkr.means.sd.llo, pars = c("cor_worker_id__"))

  • Summary
# model summary
print(m.wrkr.means.sd.llo)
##  Family: gaussian 
##   Links: mu = identity; sigma = log 
## Formula: lo_p_sup ~ (1 + lo_ground_truth | sharecor | worker_id) + lo_ground_truth * means * sd_diff 
##          sigma ~ (1 | sharecor | worker_id)
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 3000; warmup = 500; thin = 1;
##          total post-warmup samples = 5000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 623) 
##                                      Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(Intercept)                            0.45      0.02     0.40     0.50 1.01
## sd(lo_ground_truth)                      0.45      0.01     0.42     0.48 1.00
## sd(sigma_Intercept)                      0.86      0.02     0.81     0.90 1.00
## cor(Intercept,lo_ground_truth)          -0.24      0.05    -0.32    -0.14 1.01
## cor(Intercept,sigma_Intercept)          -0.41      0.04    -0.49    -0.32 1.01
## cor(lo_ground_truth,sigma_Intercept)     0.58      0.03     0.52     0.63 1.00
##                                      Bulk_ESS Tail_ESS
## sd(Intercept)                             669     1433
## sd(lo_ground_truth)                       348      787
## sd(sigma_Intercept)                       831     2209
## cor(Intercept,lo_ground_truth)            323      807
## cor(Intercept,sigma_Intercept)            381      796
## cor(lo_ground_truth,sigma_Intercept)     1112     2331
## 
## Population-Level Effects: 
##                                     Estimate Est.Error l-95% CI u-95% CI Rhat
## Intercept                              -0.17      0.02    -0.20    -0.13 1.00
## sigma_Intercept                        -0.82      0.03    -0.89    -0.75 1.01
## lo_ground_truth                         0.48      0.02     0.45     0.52 1.01
## meansTRUE                              -0.00      0.01    -0.02     0.01 1.00
## sd_diff15                               0.03      0.01     0.02     0.05 1.00
## lo_ground_truth:meansTRUE              -0.00      0.01    -0.01     0.00 1.00
## lo_ground_truth:sd_diff15               0.11      0.01     0.10     0.12 1.00
## meansTRUE:sd_diff15                     0.02      0.01    -0.00     0.04 1.00
## lo_ground_truth:meansTRUE:sd_diff15     0.03      0.01     0.01     0.04 1.00
##                                     Bulk_ESS Tail_ESS
## Intercept                                320      857
## sigma_Intercept                          385     1104
## lo_ground_truth                          189      370
## meansTRUE                               6553     4641
## sd_diff15                               5668     4423
## lo_ground_truth:meansTRUE               6581     4439
## lo_ground_truth:sd_diff15               5686     4288
## meansTRUE:sd_diff15                     5234     4241
## lo_ground_truth:meansTRUE:sd_diff15     5604     4249
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s check our posterior predictive distribution.

# posterior predictive check
model_df %>%
  select(lo_ground_truth, worker_id, means, sd_diff) %>%
  add_predicted_draws(m.wrkr.means.sd.llo, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
  mutate(
    # transform to probability units
    post_p_sup = plogis(lo_p_sup)
  ) %>%
  ggplot(aes(x = post_p_sup)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior predictive distribution for probability of superiority") +
  theme(panel.grid = element_blank())

How do these predictions compare to the observed data?

# data density
model_df %>%
  ggplot(aes(x = p_superiority)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Data distribution for probability of superiority") +
  theme(panel.grid = element_blank())

Running a leave one out posterior predictive check, we can see that overall this model has decent predictive validity.

# set up data for LOO posterior predictive check
y <- model_df$lo_p_sup
yrep <- posterior_predict(m.wrkr.means.sd.llo)

# run LOO to get weights
loo <- loo(m.wrkr.means.sd.llo, save_psis = TRUE, cores = 2)
## Warning: Found 208 observations with a pareto_k > 0.7 in model
## 'm.wrkr.means.sd.llo'. With this many problematic observations, it may be more
## appropriate to use 'kfold' with argument 'K = 10' to perform 10-fold cross-
## validation rather than LOO.
psis <- loo$psis_object
lw <- weights(psis)
ppc_loo_pit_qq(y, yrep, lw = lw)

Let’s take a look at predictions per worker and visualization condition to get a more granular sense of our model fit.

model_check_df %>%
  group_by(lo_ground_truth, worker_id, means, sd_diff) %>%
  add_predicted_draws(m.wrkr.means.sd.llo, n = 500) %>%
  ggplot(aes(x = lo_ground_truth, y = lo_p_sup, color = condition, fill = condition)) +
  geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
  stat_lineribbon(aes(y = .prediction), .width = c(.95, .80, .50), alpha = .25) +
  geom_point(data = model_check_df) +
  scale_fill_brewer(type = "qual", palette = 2) +
  scale_color_brewer(type = "qual", palette = 2) + 
  coord_cartesian(xlim = quantile(model_df$lo_ground_truth, c(0, 1)),
                  ylim = quantile(model_df$lo_p_sup, c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_wrap(~ worker_id)

What does this look like in probability units?

model_check_df %>%
  group_by(lo_ground_truth, worker_id, means, sd_diff) %>%
  add_predicted_draws(m.wrkr.means.sd.llo, n = 500) %>%
  ggplot(aes(x = plogis(lo_ground_truth), y = plogis(lo_p_sup), color = condition, fill = condition)) +
  geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
  stat_lineribbon(aes(y = plogis(.prediction)), .width = c(.95, .80, .50), alpha = .25) +
  geom_point(data = model_check_df) +
  scale_fill_brewer(type = "qual", palette = 2) +
  scale_color_brewer(type = "qual", palette = 2) + 
  coord_cartesian(xlim = quantile(plogis(model_df$lo_ground_truth), c(0, 1)),
                  ylim = quantile(plogis(model_df$lo_p_sup), c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_wrap(~ worker_id)

To examine more closely whether our model has predictive validity at the level of each worker, we’ll look at QQ plots for residuals at the worker level.

model_check_df %>%
  add_predicted_draws(m.wrkr.llo, n = 500) %>%
  group_by(lo_ground_truth, worker_id) %>%
  summarise(
    p_residual = mean(.prediction < lo_p_sup), # what proportion of predicted judgments are less than the observed response?
    z_residual = qnorm(p_residual)             # what are the z-scores of these cumulative probabilities?
  ) %>%
  ggplot(aes(sample = z_residual)) +
  geom_qq() +
  geom_abline() +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_wrap(~ worker_id)

These still look pretty terrible.

What does the posterior for the slope of the LLO model look like when means are present vs absent at different levels of uncertainty, ignoring other manipulations?

model_df %>%
  group_by(means, sd_diff) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%          # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.wrkr.means.sd.llo, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%  # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(means, sd_diff, .draw) %>%               # group by predictors to keep
  summarise(slope = weighted.mean(slope)) %>%       # marginalize out visualization condition by taking a weighted average
  ggplot(aes(x = slope, group = means, color = means, fill = means)) +
  geom_density(alpha = 0.35) +
  scale_x_continuous(expression(slope), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior for slopes for mean present/absent") +
  theme(panel.grid = element_blank()) +
  facet_grid(. ~ sd_diff)

Recall that a slope of 1 represents no bias. Overall, people seem less biased at baseline when uncertainty is higher. With regard to the interaction, we see about what we expect. Adding means makes responses less biased when uncertainty is high. However, we also expected to see the opposite as well, that adding means would make people more biased when uncertainty is low. Maybe this will turn out only to be the case for some uncertainty visualization formats rather than across the board.

Visualization Condition

The other thing we really want to know about is the impact of visualization condition on the slopes of linear models in log odds space. Do some visualizations lead to more extreme patterns of bias than others? To test this, we’ll add an interaction between visualization condition and the ground truth. Now we have all our predictors of interest in one model (i.e., this will be the minimal model required to answer our research questions).

We use the same priors as we did for the previous model. Now, let’s fit the model to our data.

# minimal LLO model
m.m.llo <- brm(data = model_df, family = "gaussian",
               formula = bf(lo_p_sup ~  (1 + lo_ground_truth|sharecor|worker_id) + lo_ground_truth*means*sd_diff*condition,
                            sigma ~ (1|sharecor|worker_id)),
               prior = c(prior(normal(1, 0.5), class = b),
                         prior(normal(1.3, 1), class = Intercept),
                         prior(normal(0, 0.15), class = sd, group = worker_id),
                         # prior(normal(0, 0.3), class = b, dpar = sigma),
                         prior(normal(0, 0.15), class = sd, dpar = sigma),
                         prior(lkj(4), class = cor)),
               iter = 12000, warmup = 2000, chains = 2, cores = 2, thin = 2,
               control = list(adapt_delta = 0.99, max_treedepth = 12),
               file = "model-fits/llo_mdl-minimal")

Check diagnostics:

  • Trace plots
# trace plots
plot(m.m.llo)

  • Pairs plot
# pairs plot (intercepts)
pairs(m.m.llo, exact_match = TRUE, pars = c("b_Intercept",
                                            "b_lo_ground_truth",
                                            "b_meansTRUE",
                                            "b_sd_diff15",
                                            "b_conditionintervals",
                                            "b_meansTRUE:sd_diff15",
                                            "b_meansTRUE:conditionintervals",
                                            "b_sd_diff15:conditionintervals",
                                            "b_meansTRUE:sd_diff15:conditionintervals"))

# pairs plot (LLO slopes)
pairs(m.m.llo, exact_match = TRUE, pars = c("b_lo_ground_truth:meansTRUE",
                                            "b_lo_ground_truth:sd_diff15",
                                            "b_lo_ground_truth:conditionintervals",
                                            "b_lo_ground_truth:meansTRUE:sd_diff15",
                                            "b_lo_ground_truth:meansTRUE:conditionintervals",
                                            "b_lo_ground_truth:sd_diff15:conditionintervals",
                                            "b_lo_ground_truth:meansTRUE:sd_diff15:conditionintervals"))

# pairs plot (random effects)
pairs(m.m.llo, exact_match = TRUE, pars = c("b_sigma_Intercept",
                                            "sd_worker_id__Intercept", 
                                            "sd_worker_id__lo_ground_truth",
                                            "sd_worker_id__sigma_Intercept"))

pairs(m.m.llo, pars = c("cor_worker_id__"))

  • Summary
# model summary
print(m.m.llo)
##  Family: gaussian 
##   Links: mu = identity; sigma = log 
## Formula: lo_p_sup ~ (1 + lo_ground_truth | sharecor | worker_id) + lo_ground_truth * means * sd_diff * condition 
##          sigma ~ (1 | sharecor | worker_id)
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 12000; warmup = 2000; thin = 2;
##          total post-warmup samples = 10000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 623) 
##                                      Estimate Est.Error l-95% CI u-95% CI Rhat
## sd(Intercept)                            0.45      0.02     0.41     0.50 1.00
## sd(lo_ground_truth)                      0.45      0.01     0.42     0.48 1.00
## sd(sigma_Intercept)                      0.86      0.02     0.81     0.90 1.00
## cor(Intercept,lo_ground_truth)          -0.26      0.05    -0.34    -0.17 1.00
## cor(Intercept,sigma_Intercept)          -0.42      0.04    -0.50    -0.33 1.00
## cor(lo_ground_truth,sigma_Intercept)     0.60      0.03     0.55     0.66 1.00
##                                      Bulk_ESS Tail_ESS
## sd(Intercept)                            3094     6130
## sd(lo_ground_truth)                      1764     3623
## sd(sigma_Intercept)                      4127     6551
## cor(Intercept,lo_ground_truth)           1764     3842
## cor(Intercept,sigma_Intercept)           2206     4631
## cor(lo_ground_truth,sigma_Intercept)     4429     6923
## 
## Population-Level Effects: 
##                                                        Estimate Est.Error
## Intercept                                                 -0.20      0.04
## sigma_Intercept                                           -0.83      0.04
## lo_ground_truth                                            0.50      0.03
## meansTRUE                                                 -0.02      0.02
## sd_diff15                                                  0.04      0.02
## conditionHOPs                                              0.07      0.06
## conditionintervals                                        -0.07      0.05
## conditionQDPs                                              0.12      0.05
## lo_ground_truth:meansTRUE                                  0.01      0.01
## lo_ground_truth:sd_diff15                                  0.10      0.01
## meansTRUE:sd_diff15                                        0.04      0.02
## lo_ground_truth:conditionHOPs                             -0.12      0.05
## lo_ground_truth:conditionintervals                        -0.04      0.04
## lo_ground_truth:conditionQDPs                              0.07      0.04
## meansTRUE:conditionHOPs                                    0.04      0.03
## meansTRUE:conditionintervals                               0.02      0.02
## meansTRUE:conditionQDPs                                   -0.00      0.02
## sd_diff15:conditionHOPs                                    0.03      0.03
## sd_diff15:conditionintervals                              -0.00      0.02
## sd_diff15:conditionQDPs                                   -0.04      0.02
## lo_ground_truth:meansTRUE:sd_diff15                        0.03      0.01
## lo_ground_truth:meansTRUE:conditionHOPs                   -0.02      0.02
## lo_ground_truth:meansTRUE:conditionintervals              -0.02      0.01
## lo_ground_truth:meansTRUE:conditionQDPs                   -0.00      0.01
## lo_ground_truth:sd_diff15:conditionHOPs                    0.04      0.02
## lo_ground_truth:sd_diff15:conditionintervals              -0.00      0.01
## lo_ground_truth:sd_diff15:conditionQDPs                    0.04      0.01
## meansTRUE:sd_diff15:conditionHOPs                         -0.01      0.04
## meansTRUE:sd_diff15:conditionintervals                    -0.03      0.03
## meansTRUE:sd_diff15:conditionQDPs                         -0.02      0.03
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs         -0.02      0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals     0.02      0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs         -0.01      0.02
##                                                        l-95% CI u-95% CI Rhat
## Intercept                                                 -0.27    -0.12 1.00
## sigma_Intercept                                           -0.90    -0.76 1.00
## lo_ground_truth                                            0.44     0.57 1.00
## meansTRUE                                                 -0.05     0.01 1.00
## sd_diff15                                                  0.01     0.07 1.00
## conditionHOPs                                             -0.04     0.18 1.00
## conditionintervals                                        -0.18     0.03 1.00
## conditionQDPs                                              0.02     0.23 1.00
## lo_ground_truth:meansTRUE                                 -0.01     0.02 1.00
## lo_ground_truth:sd_diff15                                  0.08     0.12 1.00
## meansTRUE:sd_diff15                                       -0.00     0.08 1.00
## lo_ground_truth:conditionHOPs                             -0.21    -0.03 1.00
## lo_ground_truth:conditionintervals                        -0.13     0.04 1.00
## lo_ground_truth:conditionQDPs                             -0.01     0.15 1.00
## meansTRUE:conditionHOPs                                   -0.01     0.10 1.00
## meansTRUE:conditionintervals                              -0.02     0.06 1.00
## meansTRUE:conditionQDPs                                   -0.04     0.04 1.00
## sd_diff15:conditionHOPs                                   -0.02     0.08 1.00
## sd_diff15:conditionintervals                              -0.04     0.04 1.00
## sd_diff15:conditionQDPs                                   -0.09     0.00 1.00
## lo_ground_truth:meansTRUE:sd_diff15                       -0.00     0.05 1.00
## lo_ground_truth:meansTRUE:conditionHOPs                   -0.05     0.02 1.00
## lo_ground_truth:meansTRUE:conditionintervals              -0.05     0.01 1.00
## lo_ground_truth:meansTRUE:conditionQDPs                   -0.03     0.02 1.00
## lo_ground_truth:sd_diff15:conditionHOPs                    0.01     0.07 1.00
## lo_ground_truth:sd_diff15:conditionintervals              -0.03     0.03 1.00
## lo_ground_truth:sd_diff15:conditionQDPs                    0.01     0.07 1.00
## meansTRUE:sd_diff15:conditionHOPs                         -0.09     0.06 1.00
## meansTRUE:sd_diff15:conditionintervals                    -0.09     0.02 1.00
## meansTRUE:sd_diff15:conditionQDPs                         -0.08     0.04 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs         -0.07     0.03 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals    -0.02     0.05 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs         -0.04     0.03 1.00
##                                                        Bulk_ESS Tail_ESS
## Intercept                                                  1793     3810
## sigma_Intercept                                            1942     3613
## lo_ground_truth                                             966     2186
## meansTRUE                                                  5644     7812
## sd_diff15                                                  5363     7543
## conditionHOPs                                              2261     3794
## conditionintervals                                         1896     3554
## conditionQDPs                                              1953     3552
## lo_ground_truth:meansTRUE                                  5462     7786
## lo_ground_truth:sd_diff15                                  5334     7669
## meansTRUE:sd_diff15                                        5078     7243
## lo_ground_truth:conditionHOPs                              1264     2569
## lo_ground_truth:conditionintervals                         1023     2577
## lo_ground_truth:conditionQDPs                              1057     2069
## meansTRUE:conditionHOPs                                    6479     8398
## meansTRUE:conditionintervals                               6274     8161
## meansTRUE:conditionQDPs                                    5962     8272
## sd_diff15:conditionHOPs                                    6523     8395
## sd_diff15:conditionintervals                               6088     8214
## sd_diff15:conditionQDPs                                    5799     8116
## lo_ground_truth:meansTRUE:sd_diff15                        4816     6637
## lo_ground_truth:meansTRUE:conditionHOPs                    6561     8511
## lo_ground_truth:meansTRUE:conditionintervals               6201     8234
## lo_ground_truth:meansTRUE:conditionQDPs                    6154     8418
## lo_ground_truth:sd_diff15:conditionHOPs                    6523     8631
## lo_ground_truth:sd_diff15:conditionintervals               5991     7867
## lo_ground_truth:sd_diff15:conditionQDPs                    5894     8005
## meansTRUE:sd_diff15:conditionHOPs                          6118     8078
## meansTRUE:sd_diff15:conditionintervals                     5874     7813
## meansTRUE:sd_diff15:conditionQDPs                          5598     7665
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs          6213     8160
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals     5743     7297
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs          5529     8172
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Let’s check our posterior predictive distribution.

# posterior predictive check
model_df %>%
  select(lo_ground_truth, worker_id, means, sd_diff, condition) %>%
  add_predicted_draws(m.m.llo, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
  mutate(
    # transform to probability units
    post_p_sup = plogis(lo_p_sup)
  ) %>%
  ggplot(aes(x = post_p_sup)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior predictive distribution for probability of superiority") +
  theme(panel.grid = element_blank())

How do these predictions compare to the observed data?

# data density
model_df %>%
  ggplot(aes(x = p_superiority)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Data distribution for probability of superiority") +
  theme(panel.grid = element_blank())

Running a leave one out posterior predictive check, we can see that overall this model has decent predictive validity.

# set up data for LOO posterior predictive check
y <- model_df$lo_p_sup
yrep <- posterior_predict(m.m.llo)

# run LOO to get weights
loo <- loo(m.m.llo, save_psis = TRUE, cores = 2)
## Warning: Found 198 observations with a pareto_k > 0.7 in model 'm.m.llo'. With
## this many problematic observations, it may be more appropriate to use 'kfold'
## with argument 'K = 10' to perform 10-fold cross-validation rather than LOO.
psis <- loo$psis_object
lw <- weights(psis)
ppc_loo_pit_qq(y, yrep, lw = lw)

Let’s take a look at predictions per worker and visualization condition to get a more granular sense of our model fit.

model_check_df %>%
  group_by(lo_ground_truth, worker_id, means, sd_diff, condition) %>%
  add_predicted_draws(m.m.llo, n = 500) %>%
  ggplot(aes(x = lo_ground_truth, y = lo_p_sup, color = condition, fill = condition)) +
  geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
  stat_lineribbon(aes(y = .prediction), .width = c(.95, .80, .50), alpha = .25) +
  geom_point(data = model_check_df) +
  scale_fill_brewer(type = "qual", palette = 2) +
  scale_color_brewer(type = "qual", palette = 2) + 
  coord_cartesian(xlim = quantile(model_df$lo_ground_truth, c(0, 1)),
                  ylim = quantile(model_df$lo_p_sup, c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_wrap(~ worker_id)

What does this look like in probability units?

model_check_df %>%
  group_by(lo_ground_truth, worker_id, means, sd_diff, condition) %>%
  add_predicted_draws(m.m.llo, n = 500) %>%
  ggplot(aes(x = plogis(lo_ground_truth), y = plogis(lo_p_sup), color = condition, fill = condition)) +
  geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
  stat_lineribbon(aes(y = plogis(.prediction)), .width = c(.95, .80, .50), alpha = .25) +
  geom_point(data = model_check_df) +
  scale_fill_brewer(type = "qual", palette = 2) +
  scale_color_brewer(type = "qual", palette = 2) + 
  coord_cartesian(xlim = quantile(plogis(model_df$lo_ground_truth), c(0, 1)),
                  ylim = quantile(plogis(model_df$lo_p_sup), c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_wrap(~ worker_id)

To examine more closely whether our model has predictive validity at the level of each worker, we’ll look at QQ plots for residuals at the worker level.

model_check_df %>%
  add_predicted_draws(m.m.llo, n = 500) %>%
  group_by(lo_ground_truth, worker_id) %>%
  summarise(
    p_residual = mean(.prediction < lo_p_sup), # what proportion of predicted judgments are less than the observed response?
    z_residual = qnorm(p_residual)             # what are the z-scores of these cumulative probabilities?
  ) %>%
  ggplot(aes(sample = z_residual)) +
  geom_qq() +
  geom_abline() +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_wrap(~ worker_id)

These still look pretty terrible.

What does the posterior for the slope of the LLO model look like when means are present vs absent at different levels of uncertainty, ignoring other manipulations?

model_df %>%
  group_by(means, sd_diff, condition) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%          # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.m.llo, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%  # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(means, sd_diff, .draw) %>%               # group by predictors to keep
  summarise(slope = weighted.mean(slope)) %>%       # marginalize out visualization condition by taking a weighted average
  ggplot(aes(x = slope, group = means, color = means, fill = means)) +
  geom_density(alpha = 0.35) +
  scale_x_continuous(expression(slope), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior for slopes for mean present/absent") +
  theme(panel.grid = element_blank()) +
  facet_grid(. ~ sd_diff)

This effect suggests that adding means has a debiasing effect on average when visualized uncertainty is high (marginalizing across visualization conditions). Again, is about what we expected to see. However, we expected the mean to have a biasing effect when uncertainty is low.

Let’s look at this difference in a forest plot style display.

model_df %>%
  group_by(means, sd_diff, condition) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%                    # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.m.llo, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%            # calculate the difference between fits at 1 and 0 (i.e., slope)
  compare_levels(.value, by = means) %>%                      # look at differences in slopes between means present vs absent
  rename(slope_diff = .value) %>%
  group_by(sd_diff, .draw) %>%                                # group by predictors to keep
  summarise(slope_diff = weighted.mean(slope_diff)) %>%       # marginalize out means present/absent by taking a weighted average
  ggplot(aes(x = slope_diff, y = sd_diff)) +
  stat_halfeyeh() +
  scale_x_continuous(expression(slope_diff), expand = c(0, 0)) +
  labs(subtitle = "Posterior differences in slopes for means present vs absent") +
  theme_bw()

What does the posterior for the slope in each visualization condition look like, marginalizing across other factors?

model_df %>%
  group_by(means, sd_diff, condition) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%          # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.m.llo, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%  # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(condition, .draw) %>%                    # group by predictors to keep
  summarise(slope = weighted.mean(slope)) %>%       # marginalize out means present/absent by taking a weighted average
  ggplot(aes(x = slope, group = condition, color = condition, fill = condition)) +
  geom_density(alpha = 0.35) +
  scale_fill_brewer(type = "qual", palette = 2) +
  scale_color_brewer(type = "qual", palette = 2) +
  scale_x_continuous(expression(slope), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior for slopes by visualization condition") +
  theme(panel.grid = element_blank())

Recall that a slope of 1 on the logit scale reflects no bias. This suggests that users are biased toward responses of 50% on the probability scale in all conditions but to different degrees. Quantile dotplots seem to have a substantial debiasing effect on effect size judgments when we marginalize across other manipulations.

What if we break these marginal effects down into simple effects for the interaction of the presence/absence of the mean, level of visualized uncertainty, and visualization condition?

model_df %>%
  group_by(means, sd_diff, condition) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%          # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.m.llo, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%  # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  ggplot(aes(x = slope, group = means, color = means, fill = means)) +
  geom_density(alpha = 0.35) +
  scale_x_continuous(expression(slope), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior for slopes for means * sd * visualization condition") +
  theme(panel.grid = element_blank()) +
  facet_grid(condition ~ sd_diff)

Again, this is what we expected to see. However, it is not completely clear form this chart if the simple effect of extrinsic means is reliable in some conditions.

Let’s look at the differences in a forest plot style display which should make the reliability of these differences a little easier to estimate visually.

model_df %>%
  group_by(means, sd_diff, condition) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%          # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.m.llo, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%  # calculate the difference between fits at 1 and 0 (i.e., slope)
  compare_levels(.value, by = means) %>%                 # look at differences in slopes between means present vs absent
  rename(slope_diff = .value) %>%
  unite(cond, condition, sd_diff, sep = "_", remove = FALSE) %>%
  ggplot(aes(x = slope_diff, y = cond)) +
  stat_halfeyeh() +
  scale_x_continuous(expression(slope_diff), expand = c(0, 0)) +
  labs(subtitle = "Posterior differences in slopes for means present vs absent") +
  theme_bw()

What is the predicted pattern for responses for the average worker in each cell of this interaction?

model_df %>%
  group_by(lo_ground_truth, means, sd_diff, condition) %>%
  add_predicted_draws(m.m.llo, re_formula = NA, n = 500) %>%
  ggplot(aes(x = plogis(lo_ground_truth), y = plogis(.prediction), color = means, fill = means)) +
  geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
  stat_lineribbon(aes(y = plogis(.prediction)), .width = c(.95), alpha = .25) +
  coord_cartesian(xlim = quantile(plogis(model_df$lo_ground_truth), c(0, 1)),
                  ylim = quantile(plogis(model_df$lo_p_sup), c(0, 1))) +
  theme_bw() +
  theme(panel.grid.minor = element_blank()) + 
  facet_grid(condition ~ sd_diff)

In these plots of the overall response function, we can see that the difference in performance induced by the mean is small relative to the difference between visualization conditions. We can also see that people are by far the least likely to underestimate effect size with quantile dotplots.

Next, we’ll try to get more precise estimates by expanding our random effects to include all of the within-subjects manipulations in our study design.

Building Up Random Effects for Within-Subjects Manipulations

In the minimal model to answer our research questions above, estimates for the effect of means are noisier than we would like, and predictive validity within subjects is not great. We’ll try to better account for heterogeneity across subjects by adding more random effects to our model for each within subjects manipulation.

Following a principle of model expansion, we will make this changes cumulatively. We include a series of model specifications that capture plausible structure in the data and fit without any sampling issues.

Random Effects for the Interaction of Means and Uncertainty Shown

This first model adds random effects for the within-subjects manipulations in our previous model. We prioritize the interaction between showing means and the level of uncertainty in the distributions since we had a hypothesis about this. We omit the interaction between these terms and the ground truth in the random effects specification because of fit issues: We are unable to identify the random effect of ground truth, means, and level of uncertainty with only one observation per unique combintation of these variables per worker.

# minimal LLO model with random effects for means and sd_diff
m.m.llo.r_means.sd <- brm(data = model_df, family = "gaussian",
                          formula = bf(lo_p_sup ~  (1 + lo_ground_truth + means*sd_diff|sharecor|worker_id) + lo_ground_truth*means*sd_diff*condition,
                                      sigma ~ (1|sharecor|worker_id)),
                          prior = c(prior(normal(1, 0.5), class = b),
                                    prior(normal(1.3, 1), class = Intercept),
                                    prior(normal(0, 0.15), class = sd, group = worker_id),
                                    # prior(normal(0, 0.3), class = b, dpar = sigma),
                                    prior(normal(0, 0.15), class = sd, dpar = sigma),
                                    prior(lkj(4), class = cor)),
                          iter = 12000, warmup = 2000, chains = 2, cores = 2, thin = 2,
                          control = list(adapt_delta = 0.99, max_treedepth = 12),
                          file = "model-fits/llo_mdl-min-r_means_sd")
summary(m.m.llo.r_means.sd)
##  Family: gaussian 
##   Links: mu = identity; sigma = log 
## Formula: lo_p_sup ~ (1 + lo_ground_truth + means * sd_diff | sharecor | worker_id) + lo_ground_truth * means * sd_diff * condition 
##          sigma ~ (1 | sharecor | worker_id)
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 12000; warmup = 2000; thin = 2;
##          total post-warmup samples = 10000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 623) 
##                                          Estimate Est.Error l-95% CI u-95% CI
## sd(Intercept)                                0.53      0.02     0.48     0.58
## sd(lo_ground_truth)                          0.45      0.01     0.42     0.48
## sd(meansTRUE)                                0.22      0.02     0.19     0.26
## sd(sd_diff15)                                0.17      0.01     0.15     0.19
## sd(meansTRUE:sd_diff15)                      0.09      0.01     0.07     0.12
## sd(sigma_Intercept)                          0.87      0.02     0.83     0.92
## cor(Intercept,lo_ground_truth)              -0.22      0.05    -0.31    -0.12
## cor(Intercept,meansTRUE)                     0.09      0.09    -0.08     0.26
## cor(lo_ground_truth,meansTRUE)              -0.32      0.07    -0.45    -0.19
## cor(Intercept,sd_diff15)                    -0.55      0.07    -0.68    -0.40
## cor(lo_ground_truth,sd_diff15)               0.06      0.08    -0.09     0.21
## cor(meansTRUE,sd_diff15)                    -0.17      0.09    -0.35     0.01
## cor(Intercept,meansTRUE:sd_diff15)          -0.40      0.16    -0.68    -0.06
## cor(lo_ground_truth,meansTRUE:sd_diff15)     0.32      0.13     0.04     0.57
## cor(meansTRUE,meansTRUE:sd_diff15)           0.39      0.15     0.08     0.67
## cor(sd_diff15,meansTRUE:sd_diff15)          -0.04      0.14    -0.30     0.23
## cor(Intercept,sigma_Intercept)              -0.33      0.04    -0.41    -0.24
## cor(lo_ground_truth,sigma_Intercept)         0.62      0.03     0.57     0.67
## cor(meansTRUE,sigma_Intercept)              -0.35      0.06    -0.46    -0.24
## cor(sd_diff15,sigma_Intercept)               0.25      0.07     0.12     0.38
## cor(meansTRUE:sd_diff15,sigma_Intercept)     0.30      0.11     0.07     0.51
##                                          Rhat Bulk_ESS Tail_ESS
## sd(Intercept)                            1.00     4744     7044
## sd(lo_ground_truth)                      1.00     1817     4376
## sd(meansTRUE)                            1.00     3855     6866
## sd(sd_diff15)                            1.00     7043     8325
## sd(meansTRUE:sd_diff15)                  1.00     5959     8574
## sd(sigma_Intercept)                      1.00     5236     8258
## cor(Intercept,lo_ground_truth)           1.00     1744     3359
## cor(Intercept,meansTRUE)                 1.00     4785     8137
## cor(lo_ground_truth,meansTRUE)           1.00     5443     8071
## cor(Intercept,sd_diff15)                 1.00     7985     8401
## cor(lo_ground_truth,sd_diff15)           1.00     5183     7860
## cor(meansTRUE,sd_diff15)                 1.00     5849     7964
## cor(Intercept,meansTRUE:sd_diff15)       1.00     8132     8301
## cor(lo_ground_truth,meansTRUE:sd_diff15) 1.00     5901     8257
## cor(meansTRUE,meansTRUE:sd_diff15)       1.00     5892     7990
## cor(sd_diff15,meansTRUE:sd_diff15)       1.00     8053     9032
## cor(Intercept,sigma_Intercept)           1.00     2686     4759
## cor(lo_ground_truth,sigma_Intercept)     1.00     5610     8158
## cor(meansTRUE,sigma_Intercept)           1.00     3260     6721
## cor(sd_diff15,sigma_Intercept)           1.00     3166     6162
## cor(meansTRUE:sd_diff15,sigma_Intercept) 1.00     1588     4597
## 
## Population-Level Effects: 
##                                                        Estimate Est.Error
## Intercept                                                 -0.19      0.05
## sigma_Intercept                                           -0.90      0.04
## lo_ground_truth                                            0.51      0.03
## meansTRUE                                                 -0.10      0.03
## sd_diff15                                                  0.06      0.02
## conditionHOPs                                              0.06      0.07
## conditionintervals                                        -0.11      0.06
## conditionQDPs                                              0.13      0.06
## lo_ground_truth:meansTRUE                                  0.00      0.01
## lo_ground_truth:sd_diff15                                  0.09      0.01
## meansTRUE:sd_diff15                                        0.08      0.03
## lo_ground_truth:conditionHOPs                             -0.13      0.04
## lo_ground_truth:conditionintervals                        -0.05      0.04
## lo_ground_truth:conditionQDPs                              0.06      0.04
## meansTRUE:conditionHOPs                                    0.06      0.04
## meansTRUE:conditionintervals                               0.02      0.03
## meansTRUE:conditionQDPs                                   -0.01      0.03
## sd_diff15:conditionHOPs                                    0.06      0.03
## sd_diff15:conditionintervals                               0.06      0.03
## sd_diff15:conditionQDPs                                   -0.03      0.03
## lo_ground_truth:meansTRUE:sd_diff15                        0.03      0.01
## lo_ground_truth:meansTRUE:conditionHOPs                   -0.01      0.02
## lo_ground_truth:meansTRUE:conditionintervals              -0.02      0.01
## lo_ground_truth:meansTRUE:conditionQDPs                   -0.00      0.01
## lo_ground_truth:sd_diff15:conditionHOPs                    0.05      0.02
## lo_ground_truth:sd_diff15:conditionintervals              -0.00      0.01
## lo_ground_truth:sd_diff15:conditionQDPs                    0.04      0.01
## meansTRUE:sd_diff15:conditionHOPs                         -0.02      0.04
## meansTRUE:sd_diff15:conditionintervals                    -0.02      0.03
## meansTRUE:sd_diff15:conditionQDPs                         -0.03      0.03
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs         -0.02      0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals     0.02      0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs         -0.01      0.02
##                                                        l-95% CI u-95% CI Rhat
## Intercept                                                 -0.28    -0.10 1.00
## sigma_Intercept                                           -0.97    -0.83 1.00
## lo_ground_truth                                            0.45     0.58 1.00
## meansTRUE                                                 -0.15    -0.05 1.00
## sd_diff15                                                  0.02     0.11 1.00
## conditionHOPs                                             -0.07     0.19 1.00
## conditionintervals                                        -0.23     0.01 1.00
## conditionQDPs                                              0.01     0.25 1.00
## lo_ground_truth:meansTRUE                                 -0.02     0.02 1.00
## lo_ground_truth:sd_diff15                                  0.08     0.11 1.00
## meansTRUE:sd_diff15                                        0.03     0.13 1.00
## lo_ground_truth:conditionHOPs                             -0.22    -0.04 1.00
## lo_ground_truth:conditionintervals                        -0.13     0.04 1.00
## lo_ground_truth:conditionQDPs                             -0.02     0.14 1.00
## meansTRUE:conditionHOPs                                   -0.01     0.14 1.00
## meansTRUE:conditionintervals                              -0.05     0.09 1.00
## meansTRUE:conditionQDPs                                   -0.07     0.06 1.00
## sd_diff15:conditionHOPs                                   -0.01     0.12 1.00
## sd_diff15:conditionintervals                              -0.00     0.12 1.00
## sd_diff15:conditionQDPs                                   -0.09     0.03 1.00
## lo_ground_truth:meansTRUE:sd_diff15                        0.00     0.05 1.00
## lo_ground_truth:meansTRUE:conditionHOPs                   -0.04     0.02 1.00
## lo_ground_truth:meansTRUE:conditionintervals              -0.04     0.01 1.00
## lo_ground_truth:meansTRUE:conditionQDPs                   -0.02     0.02 1.00
## lo_ground_truth:sd_diff15:conditionHOPs                    0.02     0.08 1.00
## lo_ground_truth:sd_diff15:conditionintervals              -0.03     0.02 1.00
## lo_ground_truth:sd_diff15:conditionQDPs                    0.01     0.06 1.00
## meansTRUE:sd_diff15:conditionHOPs                         -0.10     0.05 1.00
## meansTRUE:sd_diff15:conditionintervals                    -0.08     0.04 1.00
## meansTRUE:sd_diff15:conditionQDPs                         -0.09     0.03 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs         -0.07     0.02 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals    -0.01     0.05 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs         -0.04     0.03 1.00
##                                                        Bulk_ESS Tail_ESS
## Intercept                                                  2178     3483
## sigma_Intercept                                            1806     4444
## lo_ground_truth                                             994     2310
## meansTRUE                                                  3578     6955
## sd_diff15                                                  3726     6032
## conditionHOPs                                              2638     4510
## conditionintervals                                         2137     3919
## conditionQDPs                                              1899     4165
## lo_ground_truth:meansTRUE                                  7581     8731
## lo_ground_truth:sd_diff15                                  6164     8394
## meansTRUE:sd_diff15                                        2916     6384
## lo_ground_truth:conditionHOPs                              1484     3123
## lo_ground_truth:conditionintervals                         1245     2880
## lo_ground_truth:conditionQDPs                              1126     2836
## meansTRUE:conditionHOPs                                    5553     7867
## meansTRUE:conditionintervals                               4722     7005
## meansTRUE:conditionQDPs                                    5302     7618
## sd_diff15:conditionHOPs                                    5675     8021
## sd_diff15:conditionintervals                               5095     7190
## sd_diff15:conditionQDPs                                    4898     7203
## lo_ground_truth:meansTRUE:sd_diff15                        7087     7121
## lo_ground_truth:meansTRUE:conditionHOPs                    7713     8511
## lo_ground_truth:meansTRUE:conditionintervals               7422     9157
## lo_ground_truth:meansTRUE:conditionQDPs                    7351     8335
## lo_ground_truth:sd_diff15:conditionHOPs                    7175     8510
## lo_ground_truth:sd_diff15:conditionintervals               6643     8050
## lo_ground_truth:sd_diff15:conditionQDPs                    6421     7804
## meansTRUE:sd_diff15:conditionHOPs                          7628     8604
## meansTRUE:sd_diff15:conditionintervals                     6466     8202
## meansTRUE:sd_diff15:conditionQDPs                          6951     8850
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs          7429     8462
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals     6859     8629
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs          6832     7506
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Mixed Effects of Ground Truth on Sigma

In this model, we add fixed and random effects of ground truth to our sigma submodel. We add a conservative but informative prior in order to model fixed effects on sigma.

# minimal LLO model with random effects for means, sd_diff, as well as ground truth for sigma submodel
m.m.llo.r_means.sd.sigma_gt <- brm(data = model_df, family = "gaussian",
                                   formula = bf(lo_p_sup ~  (1 + lo_ground_truth + means*sd_diff|worker_id) + lo_ground_truth*means*sd_diff*condition,
                                                sigma ~ (1 + lo_ground_truth|worker_id) + lo_ground_truth),
                                   prior = c(prior(normal(1, 0.5), class = b),
                                             prior(normal(1.3, 1), class = Intercept),
                                             prior(normal(0, 0.15), class = sd, group = worker_id),
                                             prior(normal(0, 0.3), class = b, dpar = sigma),
                                             prior(normal(0, 0.15), class = sd, dpar = sigma),
                                             prior(lkj(4), class = cor)),
                                   iter = 12000, warmup = 2000, chains = 2, cores = 2, thin = 2,
                                   control = list(adapt_delta = 0.99, max_treedepth = 12),
                                   file = "model-fits/llo_mdl-min-r_means_sd_sigma_gt")
summary(m.m.llo.r_means.sd.sigma_gt)
##  Family: gaussian 
##   Links: mu = identity; sigma = log 
## Formula: lo_p_sup ~ (1 + lo_ground_truth + means * sd_diff | worker_id) + lo_ground_truth * means * sd_diff * condition 
##          sigma ~ (1 + lo_ground_truth | worker_id) + lo_ground_truth
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 12000; warmup = 2000; thin = 2;
##          total post-warmup samples = 10000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 623) 
##                                            Estimate Est.Error l-95% CI u-95% CI
## sd(Intercept)                                  0.08      0.01     0.06     0.09
## sd(lo_ground_truth)                            0.41      0.01     0.39     0.44
## sd(meansTRUE)                                  0.08      0.01     0.06     0.09
## sd(sd_diff15)                                  0.08      0.01     0.07     0.10
## sd(meansTRUE:sd_diff15)                        0.07      0.01     0.06     0.09
## sd(sigma_Intercept)                            1.24      0.03     1.18     1.31
## sd(sigma_lo_ground_truth)                      0.44      0.01     0.42     0.47
## cor(Intercept,lo_ground_truth)                -0.28      0.09    -0.45    -0.11
## cor(Intercept,meansTRUE)                      -0.39      0.10    -0.58    -0.17
## cor(lo_ground_truth,meansTRUE)                -0.55      0.08    -0.69    -0.39
## cor(Intercept,sd_diff15)                       0.11      0.11    -0.11     0.32
## cor(lo_ground_truth,sd_diff15)                -0.02      0.09    -0.21     0.16
## cor(meansTRUE,sd_diff15)                      -0.05      0.11    -0.27     0.17
## cor(Intercept,meansTRUE:sd_diff15)            -0.55      0.11    -0.75    -0.32
## cor(lo_ground_truth,meansTRUE:sd_diff15)       0.32      0.12     0.08     0.55
## cor(meansTRUE,meansTRUE:sd_diff15)             0.39      0.14     0.12     0.65
## cor(sd_diff15,meansTRUE:sd_diff15)            -0.35      0.11    -0.54    -0.12
## cor(sigma_Intercept,sigma_lo_ground_truth)    -0.73      0.02    -0.76    -0.69
##                                            Rhat Bulk_ESS Tail_ESS
## sd(Intercept)                              1.00     4504     6619
## sd(lo_ground_truth)                        1.00     3435     6615
## sd(meansTRUE)                              1.00     2234     5709
## sd(sd_diff15)                              1.00     4784     6933
## sd(meansTRUE:sd_diff15)                    1.00     6150     8421
## sd(sigma_Intercept)                        1.00     2726     5021
## sd(sigma_lo_ground_truth)                  1.00     3833     6548
## cor(Intercept,lo_ground_truth)             1.00      474     1041
## cor(Intercept,meansTRUE)                   1.00     1542     4459
## cor(lo_ground_truth,meansTRUE)             1.00     4058     7074
## cor(Intercept,sd_diff15)                   1.00     3780     5306
## cor(lo_ground_truth,sd_diff15)             1.00     4688     7818
## cor(meansTRUE,sd_diff15)                   1.00     3444     5828
## cor(Intercept,meansTRUE:sd_diff15)         1.00     3324     6593
## cor(lo_ground_truth,meansTRUE:sd_diff15)   1.00     7082     8578
## cor(meansTRUE,meansTRUE:sd_diff15)         1.00     5380     7810
## cor(sd_diff15,meansTRUE:sd_diff15)         1.00     6824     8312
## cor(sigma_Intercept,sigma_lo_ground_truth) 1.00     4256     6455
## 
## Population-Level Effects: 
##                                                        Estimate Est.Error
## Intercept                                                 -0.00      0.01
## sigma_Intercept                                           -1.49      0.05
## lo_ground_truth                                            0.37      0.03
## meansTRUE                                                 -0.03      0.01
## sd_diff15                                                  0.03      0.01
## conditionHOPs                                             -0.04      0.02
## conditionintervals                                        -0.02      0.02
## conditionQDPs                                              0.02      0.02
## lo_ground_truth:meansTRUE                                 -0.01      0.01
## lo_ground_truth:sd_diff15                                  0.11      0.01
## meansTRUE:sd_diff15                                        0.02      0.02
## lo_ground_truth:conditionHOPs                             -0.03      0.05
## lo_ground_truth:conditionintervals                        -0.06      0.05
## lo_ground_truth:conditionQDPs                              0.13      0.05
## meansTRUE:conditionHOPs                                    0.02      0.02
## meansTRUE:conditionintervals                               0.02      0.02
## meansTRUE:conditionQDPs                                   -0.02      0.02
## sd_diff15:conditionHOPs                                    0.02      0.02
## sd_diff15:conditionintervals                               0.01      0.02
## sd_diff15:conditionQDPs                                   -0.02      0.02
## lo_ground_truth:meansTRUE:sd_diff15                        0.05      0.01
## lo_ground_truth:meansTRUE:conditionHOPs                    0.01      0.02
## lo_ground_truth:meansTRUE:conditionintervals              -0.01      0.01
## lo_ground_truth:meansTRUE:conditionQDPs                    0.01      0.01
## lo_ground_truth:sd_diff15:conditionHOPs                    0.06      0.02
## lo_ground_truth:sd_diff15:conditionintervals              -0.01      0.01
## lo_ground_truth:sd_diff15:conditionQDPs                    0.01      0.01
## meansTRUE:sd_diff15:conditionHOPs                          0.02      0.03
## meansTRUE:sd_diff15:conditionintervals                    -0.01      0.02
## meansTRUE:sd_diff15:conditionQDPs                          0.01      0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs         -0.05      0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals     0.02      0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs         -0.03      0.02
## sigma_lo_ground_truth                                      0.37      0.02
##                                                        l-95% CI u-95% CI Rhat
## Intercept                                                 -0.03     0.02 1.00
## sigma_Intercept                                           -1.60    -1.39 1.00
## lo_ground_truth                                            0.31     0.44 1.00
## meansTRUE                                                 -0.05    -0.00 1.00
## sd_diff15                                                  0.01     0.06 1.00
## conditionHOPs                                             -0.08    -0.01 1.00
## conditionintervals                                        -0.05     0.01 1.00
## conditionQDPs                                             -0.01     0.05 1.00
## lo_ground_truth:meansTRUE                                 -0.03     0.01 1.00
## lo_ground_truth:sd_diff15                                  0.09     0.13 1.00
## meansTRUE:sd_diff15                                       -0.00     0.05 1.00
## lo_ground_truth:conditionHOPs                             -0.13     0.07 1.00
## lo_ground_truth:conditionintervals                        -0.16     0.03 1.00
## lo_ground_truth:conditionQDPs                              0.04     0.23 1.00
## meansTRUE:conditionHOPs                                   -0.02     0.06 1.00
## meansTRUE:conditionintervals                              -0.01     0.05 1.00
## meansTRUE:conditionQDPs                                   -0.05     0.01 1.00
## sd_diff15:conditionHOPs                                   -0.03     0.06 1.00
## sd_diff15:conditionintervals                              -0.02     0.05 1.00
## sd_diff15:conditionQDPs                                   -0.05     0.02 1.00
## lo_ground_truth:meansTRUE:sd_diff15                        0.02     0.08 1.00
## lo_ground_truth:meansTRUE:conditionHOPs                   -0.02     0.05 1.00
## lo_ground_truth:meansTRUE:conditionintervals              -0.04     0.01 1.00
## lo_ground_truth:meansTRUE:conditionQDPs                   -0.02     0.03 1.00
## lo_ground_truth:sd_diff15:conditionHOPs                    0.03     0.09 1.00
## lo_ground_truth:sd_diff15:conditionintervals              -0.04     0.01 1.00
## lo_ground_truth:sd_diff15:conditionQDPs                   -0.02     0.04 1.00
## meansTRUE:sd_diff15:conditionHOPs                         -0.04     0.07 1.00
## meansTRUE:sd_diff15:conditionintervals                    -0.05     0.03 1.00
## meansTRUE:sd_diff15:conditionQDPs                         -0.03     0.05 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs         -0.10    -0.01 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals    -0.02     0.05 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs         -0.07     0.00 1.00
## sigma_lo_ground_truth                                      0.33     0.40 1.00
##                                                        Bulk_ESS Tail_ESS
## Intercept                                                  5323     7694
## sigma_Intercept                                            1284     2414
## lo_ground_truth                                            3506     5632
## meansTRUE                                                  4770     7049
## sd_diff15                                                  5256     7418
## conditionHOPs                                              5876     7651
## conditionintervals                                         5103     7481
## conditionQDPs                                              5054     7952
## lo_ground_truth:meansTRUE                                  5052     7159
## lo_ground_truth:sd_diff15                                  5312     7748
## meansTRUE:sd_diff15                                        5271     7392
## lo_ground_truth:conditionHOPs                              3997     6671
## lo_ground_truth:conditionintervals                         3552     5534
## lo_ground_truth:conditionQDPs                              3307     5615
## meansTRUE:conditionHOPs                                    5579     7979
## meansTRUE:conditionintervals                               5013     7187
## meansTRUE:conditionQDPs                                    4883     6660
## sd_diff15:conditionHOPs                                    6009     7550
## sd_diff15:conditionintervals                               5345     7074
## sd_diff15:conditionQDPs                                    5530     7099
## lo_ground_truth:meansTRUE:sd_diff15                        4672     6898
## lo_ground_truth:meansTRUE:conditionHOPs                    5974     7838
## lo_ground_truth:meansTRUE:conditionintervals               5542     8088
## lo_ground_truth:meansTRUE:conditionQDPs                    5703     7616
## lo_ground_truth:sd_diff15:conditionHOPs                    6489     7472
## lo_ground_truth:sd_diff15:conditionintervals               5584     7933
## lo_ground_truth:sd_diff15:conditionQDPs                    6058     7697
## meansTRUE:sd_diff15:conditionHOPs                          6059     7540
## meansTRUE:sd_diff15:conditionintervals                     5890     7362
## meansTRUE:sd_diff15:conditionQDPs                          5597     8208
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs          5476     7261
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals     5209     7035
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs          5105     7481
## sigma_lo_ground_truth                                      2108     3513
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Mixed Effects of the Interaction of Means and Uncertainty Shown on Sigma

We tried out different models that add mixed effects on residual variance (sigma) for the interaction of extrinsic means and uncertainty shown. We tried many different variations of models with this set of predictors, but we were unable to achieve a usable fit. Multiple versions of the model ran for days before the chains finished sampling. The model below was the best version we managed to fit, but it still has some divergent samples. All of this indicates that we may be better off modeling this data without using means*sd_diff as a predictor of sigma.

# minimal LLO model with random effects for means, sd_diff, as well as ground truth, means, sd_diff for sigma submodel
m.m.llo.r_means.sd.sigma_gt.means.sd <- brm(
  data = model_df, family = "gaussian",
  formula = bf(lo_p_sup ~  (1 + lo_ground_truth + means*sd_diff|worker_id) + lo_ground_truth*means*sd_diff*condition,
  sigma ~ (1 + lo_ground_truth + means*sd_diff|worker_id) + lo_ground_truth*means*sd_diff),
  prior = c(prior(normal(1, 0.5), class = b),
            prior(normal(1.3, 1), class = Intercept),
            prior(normal(0, 0.15), class = sd, group = worker_id),
            prior(normal(0, 0.3), class = b, dpar = sigma),
            prior(normal(0, 0.15), class = sd, dpar = sigma),
            prior(lkj(4), class = cor)),
  iter = 12000, warmup = 2000, chains = 2, cores = 2, thin = 2,
  control = list(adapt_delta = 0.99, max_treedepth = 12),
  file = "model-fits/llo_mdl-min-r_means_sd_sigma_gt_means_sd")
summary(m.m.llo.r_means.sd.sigma_gt.means.sd)
## Warning: There were 182 divergent transitions after warmup. Increasing adapt_delta above 0.99 may help.
## See http://mc-stan.org/misc/warnings.html#divergent-transitions-after-warmup
##  Family: gaussian 
##   Links: mu = identity; sigma = log 
## Formula: lo_p_sup ~ (1 + lo_ground_truth + means * sd_diff | worker_id) + lo_ground_truth * means * sd_diff * condition 
##          sigma ~ (1 + lo_ground_truth + means * sd_diff | worker_id) + lo_ground_truth * means * sd_diff
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 12000; warmup = 2000; thin = 2;
##          total post-warmup samples = 10000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 623) 
##                                                      Estimate Est.Error
## sd(Intercept)                                            0.05      0.01
## sd(lo_ground_truth)                                      0.39      0.01
## sd(meansTRUE)                                            0.06      0.01
## sd(sd_diff15)                                            0.07      0.01
## sd(meansTRUE:sd_diff15)                                  0.05      0.01
## sd(sigma_Intercept)                                      1.33      0.04
## sd(sigma_lo_ground_truth)                                0.40      0.01
## sd(sigma_meansTRUE)                                      0.91      0.03
## sd(sigma_sd_diff15)                                      0.59      0.03
## sd(sigma_meansTRUE:sd_diff15)                            0.63      0.04
## cor(Intercept,lo_ground_truth)                          -0.41      0.10
## cor(Intercept,meansTRUE)                                -0.33      0.12
## cor(lo_ground_truth,meansTRUE)                          -0.57      0.09
## cor(Intercept,sd_diff15)                                 0.16      0.11
## cor(lo_ground_truth,sd_diff15)                           0.01      0.10
## cor(meansTRUE,sd_diff15)                                -0.06      0.11
## cor(Intercept,meansTRUE:sd_diff15)                      -0.49      0.13
## cor(lo_ground_truth,meansTRUE:sd_diff15)                 0.31      0.15
## cor(meansTRUE,meansTRUE:sd_diff15)                       0.26      0.16
## cor(sd_diff15,meansTRUE:sd_diff15)                      -0.19      0.15
## cor(sigma_Intercept,sigma_lo_ground_truth)              -0.62      0.03
## cor(sigma_Intercept,sigma_meansTRUE)                    -0.15      0.04
## cor(sigma_lo_ground_truth,sigma_meansTRUE)              -0.14      0.05
## cor(sigma_Intercept,sigma_sd_diff15)                    -0.48      0.04
## cor(sigma_lo_ground_truth,sigma_sd_diff15)               0.11      0.05
## cor(sigma_meansTRUE,sigma_sd_diff15)                     0.15      0.05
## cor(sigma_Intercept,sigma_meansTRUE:sd_diff15)          -0.06      0.05
## cor(sigma_lo_ground_truth,sigma_meansTRUE:sd_diff15)     0.11      0.06
## cor(sigma_meansTRUE,sigma_meansTRUE:sd_diff15)          -0.55      0.04
## cor(sigma_sd_diff15,sigma_meansTRUE:sd_diff15)          -0.29      0.06
##                                                      l-95% CI u-95% CI Rhat
## sd(Intercept)                                            0.04     0.07 1.00
## sd(lo_ground_truth)                                      0.37     0.42 1.00
## sd(meansTRUE)                                            0.05     0.07 1.00
## sd(sd_diff15)                                            0.06     0.08 1.00
## sd(meansTRUE:sd_diff15)                                  0.04     0.07 1.00
## sd(sigma_Intercept)                                      1.26     1.41 1.00
## sd(sigma_lo_ground_truth)                                0.37     0.42 1.00
## sd(sigma_meansTRUE)                                      0.85     0.97 1.00
## sd(sigma_sd_diff15)                                      0.54     0.65 1.00
## sd(sigma_meansTRUE:sd_diff15)                            0.56     0.70 1.00
## cor(Intercept,lo_ground_truth)                          -0.58    -0.20 1.00
## cor(Intercept,meansTRUE)                                -0.54    -0.08 1.00
## cor(lo_ground_truth,meansTRUE)                          -0.73    -0.39 1.00
## cor(Intercept,sd_diff15)                                -0.06     0.39 1.00
## cor(lo_ground_truth,sd_diff15)                          -0.19     0.20 1.00
## cor(meansTRUE,sd_diff15)                                -0.28     0.16 1.00
## cor(Intercept,meansTRUE:sd_diff15)                      -0.73    -0.21 1.00
## cor(lo_ground_truth,meansTRUE:sd_diff15)                 0.01     0.57 1.00
## cor(meansTRUE,meansTRUE:sd_diff15)                      -0.07     0.57 1.00
## cor(sd_diff15,meansTRUE:sd_diff15)                      -0.45     0.12 1.00
## cor(sigma_Intercept,sigma_lo_ground_truth)              -0.67    -0.57 1.00
## cor(sigma_Intercept,sigma_meansTRUE)                    -0.23    -0.07 1.00
## cor(sigma_lo_ground_truth,sigma_meansTRUE)              -0.23    -0.05 1.00
## cor(sigma_Intercept,sigma_sd_diff15)                    -0.55    -0.40 1.00
## cor(sigma_lo_ground_truth,sigma_sd_diff15)               0.01     0.21 1.00
## cor(sigma_meansTRUE,sigma_sd_diff15)                     0.04     0.25 1.00
## cor(sigma_Intercept,sigma_meansTRUE:sd_diff15)          -0.16     0.05 1.00
## cor(sigma_lo_ground_truth,sigma_meansTRUE:sd_diff15)    -0.00     0.22 1.00
## cor(sigma_meansTRUE,sigma_meansTRUE:sd_diff15)          -0.63    -0.47 1.00
## cor(sigma_sd_diff15,sigma_meansTRUE:sd_diff15)          -0.41    -0.17 1.00
##                                                      Bulk_ESS Tail_ESS
## sd(Intercept)                                            1513     2905
## sd(lo_ground_truth)                                      1459     3766
## sd(meansTRUE)                                             516     1120
## sd(sd_diff15)                                            2547     5089
## sd(meansTRUE:sd_diff15)                                  1727     3595
## sd(sigma_Intercept)                                      1410     2622
## sd(sigma_lo_ground_truth)                                1826     4026
## sd(sigma_meansTRUE)                                      2303     4370
## sd(sigma_sd_diff15)                                      1827     3331
## sd(sigma_meansTRUE:sd_diff15)                            1854     3483
## cor(Intercept,lo_ground_truth)                            207      170
## cor(Intercept,meansTRUE)                                  444     1133
## cor(lo_ground_truth,meansTRUE)                           1064     2150
## cor(Intercept,sd_diff15)                                 1710     3420
## cor(lo_ground_truth,sd_diff15)                           2023     4948
## cor(meansTRUE,sd_diff15)                                 1015     2607
## cor(Intercept,meansTRUE:sd_diff15)                       2250     4260
## cor(lo_ground_truth,meansTRUE:sd_diff15)                 2895     6601
## cor(meansTRUE,meansTRUE:sd_diff15)                       1703     4274
## cor(sd_diff15,meansTRUE:sd_diff15)                       2231     5074
## cor(sigma_Intercept,sigma_lo_ground_truth)               2080     4120
## cor(sigma_Intercept,sigma_meansTRUE)                     1708     3566
## cor(sigma_lo_ground_truth,sigma_meansTRUE)               1287     2771
## cor(sigma_Intercept,sigma_sd_diff15)                     2935     5007
## cor(sigma_lo_ground_truth,sigma_sd_diff15)               2564     4403
## cor(sigma_meansTRUE,sigma_sd_diff15)                     1969     3275
## cor(sigma_Intercept,sigma_meansTRUE:sd_diff15)           2542     4288
## cor(sigma_lo_ground_truth,sigma_meansTRUE:sd_diff15)     2270     4723
## cor(sigma_meansTRUE,sigma_meansTRUE:sd_diff15)           3156     5458
## cor(sigma_sd_diff15,sigma_meansTRUE:sd_diff15)           1674     3586
## 
## Population-Level Effects: 
##                                                        Estimate Est.Error
## Intercept                                                  0.00      0.01
## sigma_Intercept                                           -1.75      0.06
## lo_ground_truth                                            0.35      0.03
## meansTRUE                                                 -0.03      0.01
## sd_diff15                                                  0.03      0.01
## conditionHOPs                                             -0.04      0.01
## conditionintervals                                        -0.01      0.01
## conditionQDPs                                              0.01      0.01
## lo_ground_truth:meansTRUE                                  0.00      0.01
## lo_ground_truth:sd_diff15                                  0.11      0.01
## meansTRUE:sd_diff15                                        0.02      0.01
## lo_ground_truth:conditionHOPs                             -0.08      0.05
## lo_ground_truth:conditionintervals                        -0.08      0.05
## lo_ground_truth:conditionQDPs                              0.14      0.05
## meansTRUE:conditionHOPs                                    0.03      0.01
## meansTRUE:conditionintervals                               0.02      0.01
## meansTRUE:conditionQDPs                                   -0.02      0.01
## sd_diff15:conditionHOPs                                    0.01      0.02
## sd_diff15:conditionintervals                               0.01      0.02
## sd_diff15:conditionQDPs                                    0.00      0.02
## lo_ground_truth:meansTRUE:sd_diff15                        0.05      0.01
## lo_ground_truth:meansTRUE:conditionHOPs                    0.00      0.01
## lo_ground_truth:meansTRUE:conditionintervals              -0.01      0.01
## lo_ground_truth:meansTRUE:conditionQDPs                   -0.00      0.01
## lo_ground_truth:sd_diff15:conditionHOPs                    0.07      0.02
## lo_ground_truth:sd_diff15:conditionintervals               0.00      0.01
## lo_ground_truth:sd_diff15:conditionQDPs                   -0.00      0.01
## meansTRUE:sd_diff15:conditionHOPs                          0.03      0.02
## meansTRUE:sd_diff15:conditionintervals                     0.00      0.02
## meansTRUE:sd_diff15:conditionQDPs                          0.00      0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs         -0.06      0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals    -0.01      0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs         -0.02      0.02
## sigma_lo_ground_truth                                      0.39      0.02
## sigma_meansTRUE                                           -0.16      0.05
## sigma_sd_diff15                                            0.21      0.04
## sigma_lo_ground_truth:meansTRUE                            0.00      0.02
## sigma_lo_ground_truth:sd_diff15                            0.02      0.02
## sigma_meansTRUE:sd_diff15                                  0.12      0.05
## sigma_lo_ground_truth:meansTRUE:sd_diff15                 -0.02      0.03
##                                                        l-95% CI u-95% CI Rhat
## Intercept                                                 -0.01     0.02 1.00
## sigma_Intercept                                           -1.87    -1.64 1.00
## lo_ground_truth                                            0.29     0.41 1.00
## meansTRUE                                                 -0.05    -0.01 1.00
## sd_diff15                                                  0.01     0.05 1.00
## conditionHOPs                                             -0.06    -0.01 1.00
## conditionintervals                                        -0.03     0.01 1.00
## conditionQDPs                                             -0.02     0.03 1.00
## lo_ground_truth:meansTRUE                                 -0.01     0.01 1.00
## lo_ground_truth:sd_diff15                                  0.09     0.13 1.00
## meansTRUE:sd_diff15                                       -0.01     0.05 1.00
## lo_ground_truth:conditionHOPs                             -0.17     0.01 1.01
## lo_ground_truth:conditionintervals                        -0.16     0.01 1.00
## lo_ground_truth:conditionQDPs                              0.05     0.23 1.00
## meansTRUE:conditionHOPs                                    0.01     0.06 1.00
## meansTRUE:conditionintervals                              -0.00     0.04 1.00
## meansTRUE:conditionQDPs                                   -0.04     0.01 1.00
## sd_diff15:conditionHOPs                                   -0.04     0.05 1.00
## sd_diff15:conditionintervals                              -0.02     0.04 1.00
## sd_diff15:conditionQDPs                                   -0.03     0.04 1.00
## lo_ground_truth:meansTRUE:sd_diff15                        0.02     0.08 1.00
## lo_ground_truth:meansTRUE:conditionHOPs                   -0.01     0.03 1.00
## lo_ground_truth:meansTRUE:conditionintervals              -0.03     0.01 1.00
## lo_ground_truth:meansTRUE:conditionQDPs                   -0.02     0.01 1.00
## lo_ground_truth:sd_diff15:conditionHOPs                    0.03     0.10 1.00
## lo_ground_truth:sd_diff15:conditionintervals              -0.02     0.03 1.00
## lo_ground_truth:sd_diff15:conditionQDPs                   -0.03     0.02 1.00
## meansTRUE:sd_diff15:conditionHOPs                         -0.02     0.07 1.00
## meansTRUE:sd_diff15:conditionintervals                    -0.03     0.04 1.00
## meansTRUE:sd_diff15:conditionQDPs                         -0.03     0.04 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs         -0.11    -0.02 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals    -0.05     0.02 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs         -0.05     0.02 1.00
## sigma_lo_ground_truth                                      0.35     0.44 1.00
## sigma_meansTRUE                                           -0.26    -0.07 1.00
## sigma_sd_diff15                                            0.14     0.29 1.00
## sigma_lo_ground_truth:meansTRUE                           -0.04     0.05 1.00
## sigma_lo_ground_truth:sd_diff15                           -0.02     0.06 1.00
## sigma_meansTRUE:sd_diff15                                  0.02     0.22 1.00
## sigma_lo_ground_truth:meansTRUE:sd_diff15                 -0.07     0.04 1.00
##                                                        Bulk_ESS Tail_ESS
## Intercept                                                  1454     2969
## sigma_Intercept                                             747     1042
## lo_ground_truth                                             847     2098
## meansTRUE                                                  1249     2739
## sd_diff15                                                  2251     3921
## conditionHOPs                                              2102     3570
## conditionintervals                                         1712     2627
## conditionQDPs                                              1263     3160
## lo_ground_truth:meansTRUE                                  3058     5845
## lo_ground_truth:sd_diff15                                  2496     4809
## meansTRUE:sd_diff15                                        2131     3784
## lo_ground_truth:conditionHOPs                              1013     2465
## lo_ground_truth:conditionintervals                          966     2011
## lo_ground_truth:conditionQDPs                               938     2076
## meansTRUE:conditionHOPs                                    1682     3250
## meansTRUE:conditionintervals                               1296     2837
## meansTRUE:conditionQDPs                                    1272     2405
## sd_diff15:conditionHOPs                                    3094     5492
## sd_diff15:conditionintervals                               2308     3973
## sd_diff15:conditionQDPs                                    2572     4404
## lo_ground_truth:meansTRUE:sd_diff15                        2117     3513
## lo_ground_truth:meansTRUE:conditionHOPs                    2902     4918
## lo_ground_truth:meansTRUE:conditionintervals               3386     6188
## lo_ground_truth:meansTRUE:conditionQDPs                    3259     6243
## lo_ground_truth:sd_diff15:conditionHOPs                    3127     5422
## lo_ground_truth:sd_diff15:conditionintervals               2847     5773
## lo_ground_truth:sd_diff15:conditionQDPs                    2780     5355
## meansTRUE:sd_diff15:conditionHOPs                          2808     5188
## meansTRUE:sd_diff15:conditionintervals                     2298     3884
## meansTRUE:sd_diff15:conditionQDPs                          2536     5075
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs          2663     5009
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals     2401     3984
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs          2483     4674
## sigma_lo_ground_truth                                      1492     2465
## sigma_meansTRUE                                            1755     3479
## sigma_sd_diff15                                            2374     4410
## sigma_lo_ground_truth:meansTRUE                            3591     5784
## sigma_lo_ground_truth:sd_diff15                            3412     5977
## sigma_meansTRUE:sd_diff15                                  2984     5166
## sigma_lo_ground_truth:meansTRUE:sd_diff15                  3489     5940
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Mixed Effects of Trial Order on Mean Response

Building on our model with ground truth as a predictor of sigma, this model adds mixed effects of trial order on mean response. This is effectively modeling a learning effect on the mean response at each level of ground truth.

# minimal LLO model with random effects for means, sd_diff, trial as well as ground truth for sigma submodel
m.m.llo.r_means.sd.trial.sigma_gt <- brm(
  data = model_df, family = "gaussian",
  formula = bf(lo_p_sup ~  (1 + lo_ground_truth*trial + means*sd_diff|worker_id) + lo_ground_truth*means*sd_diff*condition + lo_ground_truth*condition*trial,
               sigma ~ (1 + lo_ground_truth|worker_id) + lo_ground_truth),
  prior = c(prior(normal(1, 0.5), class = b),
            prior(normal(1.3, 1), class = Intercept),
            prior(normal(0, 0.15), class = sd, group = worker_id),
            prior(normal(0, 0.3), class = b, dpar = sigma),
            prior(normal(0, 0.15), class = sd, dpar = sigma),
            prior(lkj(4), class = cor)),
  iter = 12000, warmup = 2000, chains = 2, cores = 2, thin = 2,
  control = list(adapt_delta = 0.99, max_treedepth = 12),
  file = "model-fits/llo_mdl-min-r_means_sd_trial_sigma_gt3")
summary(m.m.llo.r_means.sd.trial.sigma_gt)
##  Family: gaussian 
##   Links: mu = identity; sigma = log 
## Formula: lo_p_sup ~ (1 + lo_ground_truth * trial + means * sd_diff | worker_id) + lo_ground_truth * means * sd_diff * condition + lo_ground_truth * condition * trial 
##          sigma ~ (1 + lo_ground_truth | worker_id) + lo_ground_truth
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 12000; warmup = 2000; thin = 2;
##          total post-warmup samples = 10000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 623) 
##                                                Estimate Est.Error l-95% CI
## sd(Intercept)                                      0.07      0.01     0.06
## sd(lo_ground_truth)                                0.41      0.01     0.38
## sd(trial)                                          0.03      0.02     0.00
## sd(meansTRUE)                                      0.03      0.01     0.02
## sd(sd_diff15)                                      0.08      0.01     0.07
## sd(lo_ground_truth:trial)                          0.28      0.02     0.25
## sd(meansTRUE:sd_diff15)                            0.06      0.01     0.04
## sd(sigma_Intercept)                                1.25      0.03     1.19
## sd(sigma_lo_ground_truth)                          0.45      0.01     0.42
## cor(Intercept,lo_ground_truth)                    -0.43      0.09    -0.58
## cor(Intercept,trial)                               0.16      0.23    -0.34
## cor(lo_ground_truth,trial)                        -0.20      0.24    -0.61
## cor(Intercept,meansTRUE)                          -0.02      0.18    -0.37
## cor(lo_ground_truth,meansTRUE)                    -0.64      0.13    -0.84
## cor(trial,meansTRUE)                               0.16      0.25    -0.36
## cor(Intercept,sd_diff15)                           0.06      0.11    -0.16
## cor(lo_ground_truth,sd_diff15)                     0.00      0.09    -0.17
## cor(trial,sd_diff15)                               0.11      0.21    -0.33
## cor(meansTRUE,sd_diff15)                           0.03      0.16    -0.29
## cor(Intercept,lo_ground_truth:trial)              -0.30      0.09    -0.48
## cor(lo_ground_truth,lo_ground_truth:trial)         0.38      0.06     0.26
## cor(trial,lo_ground_truth:trial)                  -0.18      0.23    -0.59
## cor(meansTRUE,lo_ground_truth:trial)              -0.16      0.16    -0.46
## cor(sd_diff15,lo_ground_truth:trial)              -0.02      0.09    -0.19
## cor(Intercept,meansTRUE:sd_diff15)                -0.38      0.14    -0.63
## cor(lo_ground_truth,meansTRUE:sd_diff15)           0.27      0.14    -0.01
## cor(trial,meansTRUE:sd_diff15)                     0.06      0.23    -0.40
## cor(meansTRUE,meansTRUE:sd_diff15)                -0.07      0.19    -0.43
## cor(sd_diff15,meansTRUE:sd_diff15)                -0.31      0.12    -0.54
## cor(lo_ground_truth:trial,meansTRUE:sd_diff15)     0.02      0.13    -0.24
## cor(sigma_Intercept,sigma_lo_ground_truth)        -0.74      0.02    -0.77
##                                                u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)                                      0.08 1.00     3140     5794
## sd(lo_ground_truth)                                0.43 1.00     1884     5204
## sd(trial)                                          0.06 1.00     1276     3767
## sd(meansTRUE)                                      0.05 1.00     1266     3688
## sd(sd_diff15)                                      0.10 1.00     4351     6757
## sd(lo_ground_truth:trial)                          0.31 1.00     2720     5994
## sd(meansTRUE:sd_diff15)                            0.07 1.00     3349     5452
## sd(sigma_Intercept)                                1.32 1.00     2285     4270
## sd(sigma_lo_ground_truth)                          0.47 1.00     3299     5987
## cor(Intercept,lo_ground_truth)                    -0.24 1.00      348      790
## cor(Intercept,trial)                               0.57 1.00     5497     7326
## cor(lo_ground_truth,trial)                         0.31 1.00     4534     7555
## cor(Intercept,meansTRUE)                           0.35 1.00     2077     4976
## cor(lo_ground_truth,meansTRUE)                    -0.34 1.00     3228     5854
## cor(trial,meansTRUE)                               0.61 1.00     2335     4563
## cor(Intercept,sd_diff15)                           0.27 1.00     3127     5812
## cor(lo_ground_truth,sd_diff15)                     0.18 1.00     3941     7927
## cor(trial,sd_diff15)                               0.50 1.01      312      893
## cor(meansTRUE,sd_diff15)                           0.35 1.00      688     1323
## cor(Intercept,lo_ground_truth:trial)              -0.12 1.00      962     2781
## cor(lo_ground_truth,lo_ground_truth:trial)         0.50 1.00     6000     7388
## cor(trial,lo_ground_truth:trial)                   0.31 1.00      320      883
## cor(meansTRUE,lo_ground_truth:trial)               0.16 1.00      359      887
## cor(sd_diff15,lo_ground_truth:trial)               0.15 1.00     2375     5128
## cor(Intercept,meansTRUE:sd_diff15)                -0.11 1.00     3970     7025
## cor(lo_ground_truth,meansTRUE:sd_diff15)           0.52 1.00     4785     7182
## cor(trial,meansTRUE:sd_diff15)                     0.49 1.00     1325     2513
## cor(meansTRUE,meansTRUE:sd_diff15)                 0.32 1.00     2270     5051
## cor(sd_diff15,meansTRUE:sd_diff15)                -0.05 1.00     3540     6986
## cor(lo_ground_truth:trial,meansTRUE:sd_diff15)     0.27 1.00     1986     6033
## cor(sigma_Intercept,sigma_lo_ground_truth)        -0.70 1.00     3770     6475
## 
## Population-Level Effects: 
##                                                        Estimate Est.Error
## Intercept                                                 -0.01      0.01
## sigma_Intercept                                           -1.50      0.05
## lo_ground_truth                                            0.38      0.03
## meansTRUE                                                 -0.02      0.01
## sd_diff15                                                  0.03      0.01
## conditionHOPs                                             -0.05      0.02
## conditionintervals                                        -0.02      0.01
## conditionQDPs                                              0.01      0.01
## trial                                                     -0.04      0.01
## lo_ground_truth:meansTRUE                                 -0.02      0.01
## lo_ground_truth:sd_diff15                                  0.10      0.01
## meansTRUE:sd_diff15                                        0.02      0.01
## lo_ground_truth:conditionHOPs                             -0.02      0.05
## lo_ground_truth:conditionintervals                        -0.06      0.05
## lo_ground_truth:conditionQDPs                              0.14      0.05
## meansTRUE:conditionHOPs                                    0.03      0.02
## meansTRUE:conditionintervals                               0.03      0.01
## meansTRUE:conditionQDPs                                   -0.01      0.01
## sd_diff15:conditionHOPs                                    0.02      0.02
## sd_diff15:conditionintervals                               0.01      0.02
## sd_diff15:conditionQDPs                                   -0.02      0.02
## lo_ground_truth:trial                                      0.10      0.03
## conditionHOPs:trial                                        0.07      0.03
## conditionintervals:trial                                   0.03      0.02
## conditionQDPs:trial                                        0.03      0.02
## lo_ground_truth:meansTRUE:sd_diff15                        0.05      0.01
## lo_ground_truth:meansTRUE:conditionHOPs                   -0.01      0.02
## lo_ground_truth:meansTRUE:conditionintervals              -0.01      0.02
## lo_ground_truth:meansTRUE:conditionQDPs                   -0.00      0.02
## lo_ground_truth:sd_diff15:conditionHOPs                    0.06      0.02
## lo_ground_truth:sd_diff15:conditionintervals              -0.02      0.01
## lo_ground_truth:sd_diff15:conditionQDPs                    0.01      0.01
## meansTRUE:sd_diff15:conditionHOPs                          0.02      0.03
## meansTRUE:sd_diff15:conditionintervals                    -0.02      0.02
## meansTRUE:sd_diff15:conditionQDPs                          0.01      0.02
## lo_ground_truth:conditionHOPs:trial                       -0.10      0.04
## lo_ground_truth:conditionintervals:trial                   0.01      0.04
## lo_ground_truth:conditionQDPs:trial                       -0.01      0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs         -0.06      0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals     0.02      0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs         -0.03      0.02
## sigma_lo_ground_truth                                      0.35      0.02
##                                                        l-95% CI u-95% CI Rhat
## Intercept                                                 -0.03     0.01 1.00
## sigma_Intercept                                           -1.60    -1.40 1.00
## lo_ground_truth                                            0.31     0.44 1.00
## meansTRUE                                                 -0.04    -0.00 1.00
## sd_diff15                                                  0.01     0.06 1.00
## conditionHOPs                                             -0.08    -0.01 1.00
## conditionintervals                                        -0.05     0.00 1.00
## conditionQDPs                                             -0.01     0.04 1.00
## trial                                                     -0.06    -0.01 1.00
## lo_ground_truth:meansTRUE                                 -0.04     0.00 1.00
## lo_ground_truth:sd_diff15                                  0.09     0.12 1.00
## meansTRUE:sd_diff15                                       -0.00     0.05 1.00
## lo_ground_truth:conditionHOPs                             -0.11     0.08 1.00
## lo_ground_truth:conditionintervals                        -0.15     0.03 1.00
## lo_ground_truth:conditionQDPs                              0.04     0.23 1.00
## meansTRUE:conditionHOPs                                   -0.00     0.07 1.00
## meansTRUE:conditionintervals                               0.00     0.05 1.00
## meansTRUE:conditionQDPs                                   -0.04     0.02 1.00
## sd_diff15:conditionHOPs                                   -0.03     0.06 1.00
## sd_diff15:conditionintervals                              -0.02     0.05 1.00
## sd_diff15:conditionQDPs                                   -0.05     0.02 1.00
## lo_ground_truth:trial                                      0.05     0.16 1.00
## conditionHOPs:trial                                        0.02     0.12 1.00
## conditionintervals:trial                                  -0.01     0.06 1.00
## conditionQDPs:trial                                       -0.01     0.07 1.00
## lo_ground_truth:meansTRUE:sd_diff15                        0.03     0.08 1.00
## lo_ground_truth:meansTRUE:conditionHOPs                   -0.04     0.03 1.00
## lo_ground_truth:meansTRUE:conditionintervals              -0.04     0.02 1.00
## lo_ground_truth:meansTRUE:conditionQDPs                   -0.03     0.03 1.00
## lo_ground_truth:sd_diff15:conditionHOPs                    0.03     0.09 1.00
## lo_ground_truth:sd_diff15:conditionintervals              -0.04     0.01 1.00
## lo_ground_truth:sd_diff15:conditionQDPs                   -0.02     0.04 1.00
## meansTRUE:sd_diff15:conditionHOPs                         -0.03     0.07 1.00
## meansTRUE:sd_diff15:conditionintervals                    -0.05     0.02 1.00
## meansTRUE:sd_diff15:conditionQDPs                         -0.03     0.05 1.00
## lo_ground_truth:conditionHOPs:trial                       -0.19    -0.02 1.00
## lo_ground_truth:conditionintervals:trial                  -0.06     0.09 1.00
## lo_ground_truth:conditionQDPs:trial                       -0.09     0.07 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs         -0.10    -0.01 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals    -0.01     0.06 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs         -0.07     0.00 1.00
## sigma_lo_ground_truth                                      0.32     0.39 1.00
##                                                        Bulk_ESS Tail_ESS
## Intercept                                                  4325     5812
## sigma_Intercept                                            1241     2419
## lo_ground_truth                                            2043     4011
## meansTRUE                                                  4265     7230
## sd_diff15                                                  4178     6496
## conditionHOPs                                              5047     7572
## conditionintervals                                         4445     6526
## conditionQDPs                                              3535     5704
## trial                                                      7362     8484
## lo_ground_truth:meansTRUE                                  5397     7792
## lo_ground_truth:sd_diff15                                  4899     7791
## meansTRUE:sd_diff15                                        4223     7238
## lo_ground_truth:conditionHOPs                              2530     4736
## lo_ground_truth:conditionintervals                         2268     4105
## lo_ground_truth:conditionQDPs                              2248     4180
## meansTRUE:conditionHOPs                                    5959     7246
## meansTRUE:conditionintervals                               4674     7345
## meansTRUE:conditionQDPs                                    4145     7443
## sd_diff15:conditionHOPs                                    5957     8107
## sd_diff15:conditionintervals                               4997     7785
## sd_diff15:conditionQDPs                                    4740     6988
## lo_ground_truth:trial                                      4647     7841
## conditionHOPs:trial                                        7719     8604
## conditionintervals:trial                                   7360     8575
## conditionQDPs:trial                                        7092     7100
## lo_ground_truth:meansTRUE:sd_diff15                        4533     7175
## lo_ground_truth:meansTRUE:conditionHOPs                    6238     7639
## lo_ground_truth:meansTRUE:conditionintervals               5875     7883
## lo_ground_truth:meansTRUE:conditionQDPs                    5722     8075
## lo_ground_truth:sd_diff15:conditionHOPs                    6215     7948
## lo_ground_truth:sd_diff15:conditionintervals               5246     7799
## lo_ground_truth:sd_diff15:conditionQDPs                    5462     7555
## meansTRUE:sd_diff15:conditionHOPs                          5977     8312
## meansTRUE:sd_diff15:conditionintervals                     4868     7095
## meansTRUE:sd_diff15:conditionQDPs                          5018     7536
## lo_ground_truth:conditionHOPs:trial                        5657     8005
## lo_ground_truth:conditionintervals:trial                   4630     7588
## lo_ground_truth:conditionQDPs:trial                        4693     7329
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs          5525     8047
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals     4919     7513
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs          5144     7796
## sigma_lo_ground_truth                                      1833     3589
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Mixed Effects of Trial Order on Sigma

This model adds fixed and random intercepts of trial order to sigma submodel. This is a a learning effect on residual variance.

# minimal LLO model with random effects for means, sd_diff, trial as well as ground truth for sigma submodel
m.m.llo.r_means.sd.trial.sigma_gt.trial <- brm(
  data = model_df, family = "gaussian",
  formula = bf(lo_p_sup ~  (1 + lo_ground_truth*trial + means*sd_diff|worker_id) + lo_ground_truth*means*sd_diff*condition + lo_ground_truth*condition*trial,
               sigma ~ (1 + lo_ground_truth + trial|worker_id) + lo_ground_truth*condition*trial),
  prior = c(prior(normal(1, 0.5), class = b),
            prior(normal(1.3, 1), class = Intercept),
            prior(normal(0, 0.15), class = sd, group = worker_id),
            prior(normal(0, 0.3), class = b, dpar = sigma),
            prior(normal(0, 0.15), class = sd, dpar = sigma),
            prior(lkj(4), class = cor)),
  iter = 12000, warmup = 2000, chains = 2, cores = 2, thin = 2,
  control = list(adapt_delta = 0.99, max_treedepth = 12),
  file = "model-fits/llo_mdl-min-r_means_sd_trial_sigma_gt_trial3b")
summary(m.m.llo.r_means.sd.trial.sigma_gt.trial)
##  Family: gaussian 
##   Links: mu = identity; sigma = log 
## Formula: lo_p_sup ~ (1 + lo_ground_truth * trial + means * sd_diff | worker_id) + lo_ground_truth * means * sd_diff * condition + lo_ground_truth * condition * trial 
##          sigma ~ (1 + lo_ground_truth + trial | worker_id) + lo_ground_truth * condition * trial
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 12000; warmup = 2000; thin = 2;
##          total post-warmup samples = 10000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 623) 
##                                                Estimate Est.Error l-95% CI
## sd(Intercept)                                      0.06      0.01     0.05
## sd(lo_ground_truth)                                0.39      0.01     0.37
## sd(trial)                                          0.03      0.01     0.00
## sd(meansTRUE)                                      0.04      0.01     0.02
## sd(sd_diff15)                                      0.08      0.01     0.07
## sd(lo_ground_truth:trial)                          0.24      0.02     0.21
## sd(meansTRUE:sd_diff15)                            0.06      0.01     0.04
## sd(sigma_Intercept)                                1.18      0.03     1.12
## sd(sigma_lo_ground_truth)                          0.41      0.01     0.38
## sd(sigma_trial)                                    1.19      0.04     1.12
## cor(Intercept,lo_ground_truth)                    -0.43      0.09    -0.61
## cor(Intercept,trial)                               0.19      0.22    -0.28
## cor(lo_ground_truth,trial)                        -0.30      0.22    -0.67
## cor(Intercept,meansTRUE)                          -0.00      0.17    -0.33
## cor(lo_ground_truth,meansTRUE)                    -0.66      0.11    -0.84
## cor(trial,meansTRUE)                               0.29      0.24    -0.22
## cor(Intercept,sd_diff15)                          -0.00      0.11    -0.21
## cor(lo_ground_truth,sd_diff15)                     0.01      0.09    -0.15
## cor(trial,sd_diff15)                               0.01      0.22    -0.45
## cor(meansTRUE,sd_diff15)                           0.02      0.15    -0.29
## cor(Intercept,lo_ground_truth:trial)              -0.25      0.09    -0.43
## cor(lo_ground_truth,lo_ground_truth:trial)         0.41      0.06     0.29
## cor(trial,lo_ground_truth:trial)                  -0.40      0.22    -0.73
## cor(meansTRUE,lo_ground_truth:trial)              -0.22      0.14    -0.48
## cor(sd_diff15,lo_ground_truth:trial)               0.06      0.08    -0.10
## cor(Intercept,meansTRUE:sd_diff15)                -0.36      0.13    -0.61
## cor(lo_ground_truth,meansTRUE:sd_diff15)           0.24      0.13    -0.04
## cor(trial,meansTRUE:sd_diff15)                     0.17      0.22    -0.28
## cor(meansTRUE,meansTRUE:sd_diff15)                 0.05      0.18    -0.30
## cor(sd_diff15,meansTRUE:sd_diff15)                -0.33      0.12    -0.54
## cor(lo_ground_truth:trial,meansTRUE:sd_diff15)    -0.13      0.12    -0.37
## cor(sigma_Intercept,sigma_lo_ground_truth)        -0.71      0.02    -0.75
## cor(sigma_Intercept,sigma_trial)                   0.10      0.04     0.02
## cor(sigma_lo_ground_truth,sigma_trial)            -0.06      0.04    -0.14
##                                                u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)                                      0.07 1.00     3302     6168
## sd(lo_ground_truth)                                0.42 1.00     2223     5191
## sd(trial)                                          0.06 1.00     1258     1948
## sd(meansTRUE)                                      0.05 1.00     1255     2869
## sd(sd_diff15)                                      0.09 1.00     3533     6335
## sd(lo_ground_truth:trial)                          0.27 1.00     1830     4692
## sd(meansTRUE:sd_diff15)                            0.07 1.00     3124     5466
## sd(sigma_Intercept)                                1.25 1.00     2677     4515
## sd(sigma_lo_ground_truth)                          0.43 1.00     3720     5944
## sd(sigma_trial)                                    1.27 1.00     6034     8269
## cor(Intercept,lo_ground_truth)                    -0.24 1.00      389      907
## cor(Intercept,trial)                               0.58 1.00     6316     6887
## cor(lo_ground_truth,trial)                         0.19 1.00     4054     5940
## cor(Intercept,meansTRUE)                           0.35 1.00     1794     5101
## cor(lo_ground_truth,meansTRUE)                    -0.41 1.00     2969     4560
## cor(trial,meansTRUE)                               0.68 1.00     1831     4791
## cor(Intercept,sd_diff15)                           0.21 1.00     2371     5102
## cor(lo_ground_truth,sd_diff15)                     0.19 1.00     3281     7010
## cor(trial,sd_diff15)                               0.43 1.02      364      476
## cor(meansTRUE,sd_diff15)                           0.30 1.00      483      923
## cor(Intercept,lo_ground_truth:trial)              -0.06 1.00     1336     3068
## cor(lo_ground_truth,lo_ground_truth:trial)         0.53 1.00     5619     7916
## cor(trial,lo_ground_truth:trial)                   0.11 1.00      355      823
## cor(meansTRUE,lo_ground_truth:trial)               0.06 1.00      726     1637
## cor(sd_diff15,lo_ground_truth:trial)               0.23 1.00     2849     5863
## cor(Intercept,meansTRUE:sd_diff15)                -0.08 1.00     4202     7124
## cor(lo_ground_truth,meansTRUE:sd_diff15)           0.48 1.00     4514     8125
## cor(trial,meansTRUE:sd_diff15)                     0.56 1.00     1148     2169
## cor(meansTRUE,meansTRUE:sd_diff15)                 0.42 1.00     2117     4244
## cor(sd_diff15,meansTRUE:sd_diff15)                -0.08 1.00     4177     6823
## cor(lo_ground_truth:trial,meansTRUE:sd_diff15)     0.12 1.00     3506     7178
## cor(sigma_Intercept,sigma_lo_ground_truth)        -0.67 1.00     4346     6844
## cor(sigma_Intercept,sigma_trial)                   0.17 1.00     4789     6896
## cor(sigma_lo_ground_truth,sigma_trial)             0.03 1.00     3904     5935
## 
## Population-Level Effects: 
##                                                        Estimate Est.Error
## Intercept                                                 -0.01      0.01
## sigma_Intercept                                           -1.79      0.09
## lo_ground_truth                                            0.38      0.03
## meansTRUE                                                 -0.02      0.01
## sd_diff15                                                  0.04      0.01
## conditionHOPs                                             -0.04      0.02
## conditionintervals                                        -0.01      0.01
## conditionQDPs                                              0.01      0.01
## trial                                                     -0.03      0.01
## lo_ground_truth:meansTRUE                                 -0.02      0.01
## lo_ground_truth:sd_diff15                                  0.10      0.01
## meansTRUE:sd_diff15                                        0.01      0.01
## lo_ground_truth:conditionHOPs                             -0.05      0.05
## lo_ground_truth:conditionintervals                        -0.08      0.05
## lo_ground_truth:conditionQDPs                              0.14      0.05
## meansTRUE:conditionHOPs                                    0.03      0.02
## meansTRUE:conditionintervals                               0.02      0.01
## meansTRUE:conditionQDPs                                   -0.01      0.01
## sd_diff15:conditionHOPs                                    0.02      0.02
## sd_diff15:conditionintervals                               0.01      0.02
## sd_diff15:conditionQDPs                                   -0.01      0.02
## lo_ground_truth:trial                                      0.09      0.03
## conditionHOPs:trial                                        0.05      0.02
## conditionintervals:trial                                   0.02      0.02
## conditionQDPs:trial                                        0.03      0.02
## lo_ground_truth:meansTRUE:sd_diff15                        0.06      0.01
## lo_ground_truth:meansTRUE:conditionHOPs                   -0.01      0.02
## lo_ground_truth:meansTRUE:conditionintervals               0.00      0.01
## lo_ground_truth:meansTRUE:conditionQDPs                   -0.01      0.01
## lo_ground_truth:sd_diff15:conditionHOPs                    0.06      0.02
## lo_ground_truth:sd_diff15:conditionintervals              -0.01      0.01
## lo_ground_truth:sd_diff15:conditionQDPs                    0.01      0.01
## meansTRUE:sd_diff15:conditionHOPs                          0.03      0.02
## meansTRUE:sd_diff15:conditionintervals                    -0.00      0.02
## meansTRUE:sd_diff15:conditionQDPs                          0.01      0.02
## lo_ground_truth:conditionHOPs:trial                       -0.08      0.04
## lo_ground_truth:conditionintervals:trial                  -0.01      0.04
## lo_ground_truth:conditionQDPs:trial                        0.00      0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs         -0.07      0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals    -0.00      0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs         -0.04      0.02
## sigma_lo_ground_truth                                      0.46      0.03
## sigma_conditionHOPs                                        0.59      0.12
## sigma_conditionintervals                                   0.16      0.12
## sigma_conditionQDPs                                       -0.06      0.12
## sigma_trial                                               -0.30      0.10
## sigma_lo_ground_truth:conditionHOPs                       -0.17      0.05
## sigma_lo_ground_truth:conditionintervals                  -0.10      0.05
## sigma_lo_ground_truth:conditionQDPs                       -0.02      0.05
## sigma_lo_ground_truth:trial                                0.03      0.05
## sigma_conditionHOPs:trial                                  0.09      0.14
## sigma_conditionintervals:trial                             0.14      0.14
## sigma_conditionQDPs:trial                                 -0.03      0.14
## sigma_lo_ground_truth:conditionHOPs:trial                  0.03      0.07
## sigma_lo_ground_truth:conditionintervals:trial             0.06      0.07
## sigma_lo_ground_truth:conditionQDPs:trial                 -0.03      0.07
##                                                        l-95% CI u-95% CI Rhat
## Intercept                                                 -0.03     0.01 1.00
## sigma_Intercept                                           -1.96    -1.62 1.00
## lo_ground_truth                                            0.31     0.44 1.00
## meansTRUE                                                 -0.04     0.00 1.00
## sd_diff15                                                  0.01     0.06 1.00
## conditionHOPs                                             -0.07    -0.01 1.00
## conditionintervals                                        -0.04     0.01 1.00
## conditionQDPs                                             -0.01     0.04 1.00
## trial                                                     -0.06    -0.01 1.00
## lo_ground_truth:meansTRUE                                 -0.04    -0.00 1.00
## lo_ground_truth:sd_diff15                                  0.08     0.12 1.00
## meansTRUE:sd_diff15                                       -0.01     0.04 1.00
## lo_ground_truth:conditionHOPs                             -0.14     0.04 1.00
## lo_ground_truth:conditionintervals                        -0.17     0.01 1.00
## lo_ground_truth:conditionQDPs                              0.05     0.23 1.00
## meansTRUE:conditionHOPs                                   -0.00     0.07 1.00
## meansTRUE:conditionintervals                              -0.01     0.05 1.00
## meansTRUE:conditionQDPs                                   -0.04     0.02 1.00
## sd_diff15:conditionHOPs                                   -0.03     0.06 1.00
## sd_diff15:conditionintervals                              -0.02     0.05 1.00
## sd_diff15:conditionQDPs                                   -0.05     0.02 1.00
## lo_ground_truth:trial                                      0.04     0.14 1.00
## conditionHOPs:trial                                        0.00     0.09 1.00
## conditionintervals:trial                                  -0.01     0.06 1.00
## conditionQDPs:trial                                       -0.00     0.07 1.00
## lo_ground_truth:meansTRUE:sd_diff15                        0.04     0.09 1.00
## lo_ground_truth:meansTRUE:conditionHOPs                   -0.04     0.03 1.00
## lo_ground_truth:meansTRUE:conditionintervals              -0.03     0.03 1.00
## lo_ground_truth:meansTRUE:conditionQDPs                   -0.04     0.02 1.00
## lo_ground_truth:sd_diff15:conditionHOPs                    0.03     0.10 1.00
## lo_ground_truth:sd_diff15:conditionintervals              -0.04     0.01 1.00
## lo_ground_truth:sd_diff15:conditionQDPs                   -0.02     0.03 1.00
## meansTRUE:sd_diff15:conditionHOPs                         -0.02     0.07 1.00
## meansTRUE:sd_diff15:conditionintervals                    -0.04     0.03 1.00
## meansTRUE:sd_diff15:conditionQDPs                         -0.03     0.05 1.00
## lo_ground_truth:conditionHOPs:trial                       -0.15    -0.00 1.00
## lo_ground_truth:conditionintervals:trial                  -0.08     0.06 1.00
## lo_ground_truth:conditionQDPs:trial                       -0.07     0.07 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs         -0.11    -0.03 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals    -0.04     0.03 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs         -0.07    -0.00 1.00
## sigma_lo_ground_truth                                      0.39     0.52 1.00
## sigma_conditionHOPs                                        0.34     0.83 1.00
## sigma_conditionintervals                                  -0.07     0.40 1.00
## sigma_conditionQDPs                                       -0.29     0.18 1.00
## sigma_trial                                               -0.49    -0.11 1.00
## sigma_lo_ground_truth:conditionHOPs                       -0.26    -0.08 1.00
## sigma_lo_ground_truth:conditionintervals                  -0.19    -0.00 1.00
## sigma_lo_ground_truth:conditionQDPs                       -0.11     0.07 1.00
## sigma_lo_ground_truth:trial                               -0.06     0.13 1.00
## sigma_conditionHOPs:trial                                 -0.19     0.36 1.00
## sigma_conditionintervals:trial                            -0.14     0.41 1.00
## sigma_conditionQDPs:trial                                 -0.30     0.25 1.00
## sigma_lo_ground_truth:conditionHOPs:trial                 -0.10     0.17 1.00
## sigma_lo_ground_truth:conditionintervals:trial            -0.08     0.20 1.00
## sigma_lo_ground_truth:conditionQDPs:trial                 -0.16     0.11 1.00
##                                                        Bulk_ESS Tail_ESS
## Intercept                                                  4081     6648
## sigma_Intercept                                            1821     3365
## lo_ground_truth                                            2678     4307
## meansTRUE                                                  4072     6679
## sd_diff15                                                  4534     6510
## conditionHOPs                                              5681     7897
## conditionintervals                                         4970     7053
## conditionQDPs                                              3347     6349
## trial                                                      6496     8008
## lo_ground_truth:meansTRUE                                  4976     7467
## lo_ground_truth:sd_diff15                                  4477     7558
## meansTRUE:sd_diff15                                        4604     6923
## lo_ground_truth:conditionHOPs                              3883     6402
## lo_ground_truth:conditionintervals                         2624     5155
## lo_ground_truth:conditionQDPs                              3097     5313
## meansTRUE:conditionHOPs                                    5004     7975
## meansTRUE:conditionintervals                               4166     7118
## meansTRUE:conditionQDPs                                    3360     6887
## sd_diff15:conditionHOPs                                    5563     8427
## sd_diff15:conditionintervals                               4837     7696
## sd_diff15:conditionQDPs                                    4933     7348
## lo_ground_truth:trial                                      5184     7714
## conditionHOPs:trial                                        7734     9209
## conditionintervals:trial                                   7053     8544
## conditionQDPs:trial                                        6489     8562
## lo_ground_truth:meansTRUE:sd_diff15                        4471     7063
## lo_ground_truth:meansTRUE:conditionHOPs                    6152     8179
## lo_ground_truth:meansTRUE:conditionintervals               5286     7658
## lo_ground_truth:meansTRUE:conditionQDPs                    5111     7726
## lo_ground_truth:sd_diff15:conditionHOPs                    5426     7541
## lo_ground_truth:sd_diff15:conditionintervals               5061     7618
## lo_ground_truth:sd_diff15:conditionQDPs                    5074     7937
## meansTRUE:sd_diff15:conditionHOPs                          5463     7689
## meansTRUE:sd_diff15:conditionintervals                     4696     7370
## meansTRUE:sd_diff15:conditionQDPs                          4774     6439
## lo_ground_truth:conditionHOPs:trial                        5970     7927
## lo_ground_truth:conditionintervals:trial                   5590     7231
## lo_ground_truth:conditionQDPs:trial                        5606     8186
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs          4880     7386
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals     5119     7528
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs          4703     7392
## sigma_lo_ground_truth                                      2779     4760
## sigma_conditionHOPs                                        1790     3398
## sigma_conditionintervals                                   1928     3892
## sigma_conditionQDPs                                        1965     3892
## sigma_trial                                                6200     7085
## sigma_lo_ground_truth:conditionHOPs                        2656     5103
## sigma_lo_ground_truth:conditionintervals                   2850     4402
## sigma_lo_ground_truth:conditionQDPs                        3096     5551
## sigma_lo_ground_truth:trial                                6865     8276
## sigma_conditionHOPs:trial                                  6447     7923
## sigma_conditionintervals:trial                             6536     7854
## sigma_conditionQDPs:trial                                  5827     6938
## sigma_lo_ground_truth:conditionHOPs:trial                  7954     8168
## sigma_lo_ground_truth:conditionintervals:trial             7102     8679
## sigma_lo_ground_truth:conditionQDPs:trial                  7095     8381
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Model Comparison

Each time we add a random effect, the number of parameters multiplies, especially since the random effects in each submodel share a covariance matrix. We want to make sure these parameters are contributing to the predictive validity of the model more than they risk overfitting. We’ll evaluate this by using WAIC to compare models. Whichever model has the smallest value of WAIC is the one that has the best predictive validity for the fewest parameters.

waic(
  m.m.llo,
  m.m.llo.r_means.sd,
  m.m.llo.r_means.sd.sigma_gt,
  m.m.llo.r_means.sd.trial.sigma_gt,
  m.m.llo.r_means.sd.trial.sigma_gt.trial)
## Output of model 'm.m.llo':
## 
## Computed from 10000 by 19924 log-likelihood matrix
## 
##           Estimate    SE
## elpd_waic -13183.3 216.9
## p_waic      2454.7  70.8
## waic       26366.7 433.8
## 
## 1125 (5.6%) p_waic estimates greater than 0.4. We recommend trying loo instead. 
## 
## Output of model 'm.m.llo.r_means.sd':
## 
## Computed from 10000 by 19924 log-likelihood matrix
## 
##           Estimate    SE
## elpd_waic -12047.5 216.5
## p_waic      3080.7  74.1
## waic       24095.0 432.9
## 
## 1453 (7.3%) p_waic estimates greater than 0.4. We recommend trying loo instead. 
## 
## Output of model 'm.m.llo.r_means.sd.sigma_gt':
## 
## Computed from 10000 by 19924 log-likelihood matrix
## 
##           Estimate    SE
## elpd_waic  -9697.7 214.1
## p_waic      2887.0  57.2
## waic       19395.5 428.3
## 
## 1292 (6.5%) p_waic estimates greater than 0.4. We recommend trying loo instead. 
## 
## Output of model 'm.m.llo.r_means.sd.trial.sigma_gt':
## 
## Computed from 10000 by 19924 log-likelihood matrix
## 
##           Estimate    SE
## elpd_waic  -9304.8 213.4
## p_waic      3099.9  57.3
## waic       18609.5 426.8
## 
## 1480 (7.4%) p_waic estimates greater than 0.4. We recommend trying loo instead. 
## 
## Output of model 'm.m.llo.r_means.sd.trial.sigma_gt.trial':
## 
## Computed from 10000 by 19924 log-likelihood matrix
## 
##           Estimate    SE
## elpd_waic  -7806.6 206.2
## p_waic      3419.6  48.1
## waic       15613.1 412.3
## 
## 1726 (8.7%) p_waic estimates greater than 0.4. We recommend trying loo instead. 
## 
## Model comparisons:
##                                         elpd_diff se_diff
## m.m.llo.r_means.sd.trial.sigma_gt.trial     0.0       0.0
## m.m.llo.r_means.sd.trial.sigma_gt       -1498.2      65.4
## m.m.llo.r_means.sd.sigma_gt             -1891.2      72.2
## m.m.llo.r_means.sd                      -4240.9     132.9
## m.m.llo                                 -5376.8     133.1

The most complex model has the lowest WAIC value, so we’ll continue expanding on it.

Add Predictors for Block Order

Let’s add block order to our previous model, just to check if the effect of the mean on judgments depends on block order. We’ll model this as a fixed effects interaction between block order and the presence absence of means. This will be the maximal model under our stategy of model expansion.

We use the same priors as we did for the previous model. Now, let’s fit the model to our data.

# hierarchical LLO model
m.max <- brm(data = model_df, family = "gaussian",
             formula = bf(lo_p_sup ~  (1 + lo_ground_truth*trial + means*sd_diff|worker_id) + lo_ground_truth*means*sd_diff*condition*start_means + lo_ground_truth*condition*trial,
                          sigma ~ (1 + lo_ground_truth + trial|worker_id) + lo_ground_truth*condition*trial + means*start_means),
             prior = c(prior(normal(1, 0.5), class = b),
                       prior(normal(1.3, 1), class = Intercept),
                       prior(normal(0, 0.15), class = sd, group = worker_id),
                       prior(normal(0, 0.3), class = b, dpar = sigma),
                       prior(normal(0, 0.15), class = sd, dpar = sigma),
                       prior(lkj(4), class = cor)),
             iter = 12000, warmup = 2000, chains = 2, cores = 2, thin = 2,
             control = list(adapt_delta = 0.99, max_treedepth = 12),
             file = "model-fits/llo_mdl-min-r_means_sd_trial_block_sigma_gt_trial_means_block-build_version")
summary(m.max)
##  Family: gaussian 
##   Links: mu = identity; sigma = log 
## Formula: lo_p_sup ~ (1 + lo_ground_truth * trial + means * sd_diff | worker_id) + lo_ground_truth * means * sd_diff * condition * start_means + lo_ground_truth * condition * trial 
##          sigma ~ (1 + lo_ground_truth + trial | worker_id) + lo_ground_truth * condition * trial + means * start_means
##    Data: model_df (Number of observations: 19924) 
## Samples: 2 chains, each with iter = 12000; warmup = 2000; thin = 2;
##          total post-warmup samples = 10000
## 
## Group-Level Effects: 
## ~worker_id (Number of levels: 623) 
##                                                Estimate Est.Error l-95% CI
## sd(Intercept)                                      0.06      0.01     0.05
## sd(lo_ground_truth)                                0.39      0.01     0.37
## sd(trial)                                          0.03      0.01     0.00
## sd(meansTRUE)                                      0.03      0.01     0.02
## sd(sd_diff15)                                      0.08      0.01     0.07
## sd(lo_ground_truth:trial)                          0.24      0.01     0.21
## sd(meansTRUE:sd_diff15)                            0.06      0.01     0.04
## sd(sigma_Intercept)                                1.18      0.03     1.12
## sd(sigma_lo_ground_truth)                          0.41      0.01     0.38
## sd(sigma_trial)                                    1.19      0.04     1.12
## cor(Intercept,lo_ground_truth)                    -0.47      0.09    -0.64
## cor(Intercept,trial)                               0.20      0.23    -0.28
## cor(lo_ground_truth,trial)                        -0.25      0.23    -0.64
## cor(Intercept,meansTRUE)                           0.04      0.18    -0.29
## cor(lo_ground_truth,meansTRUE)                    -0.60      0.13    -0.81
## cor(trial,meansTRUE)                               0.21      0.25    -0.31
## cor(Intercept,sd_diff15)                          -0.02      0.11    -0.23
## cor(lo_ground_truth,sd_diff15)                     0.03      0.09    -0.14
## cor(trial,sd_diff15)                               0.02      0.21    -0.40
## cor(meansTRUE,sd_diff15)                          -0.00      0.16    -0.34
## cor(Intercept,lo_ground_truth:trial)              -0.27      0.10    -0.45
## cor(lo_ground_truth,lo_ground_truth:trial)         0.40      0.06     0.28
## cor(trial,lo_ground_truth:trial)                  -0.36      0.23    -0.72
## cor(meansTRUE,lo_ground_truth:trial)              -0.13      0.16    -0.43
## cor(sd_diff15,lo_ground_truth:trial)               0.06      0.09    -0.10
## cor(Intercept,meansTRUE:sd_diff15)                -0.33      0.14    -0.58
## cor(lo_ground_truth,meansTRUE:sd_diff15)           0.23      0.13    -0.04
## cor(trial,meansTRUE:sd_diff15)                     0.17      0.22    -0.28
## cor(meansTRUE,meansTRUE:sd_diff15)                 0.03      0.18    -0.33
## cor(sd_diff15,meansTRUE:sd_diff15)                -0.30      0.12    -0.51
## cor(lo_ground_truth:trial,meansTRUE:sd_diff15)    -0.12      0.12    -0.36
## cor(sigma_Intercept,sigma_lo_ground_truth)        -0.71      0.02    -0.75
## cor(sigma_Intercept,sigma_trial)                   0.10      0.04     0.02
## cor(sigma_lo_ground_truth,sigma_trial)            -0.05      0.04    -0.14
##                                                u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)                                      0.07 1.00     3205     6294
## sd(lo_ground_truth)                                0.42 1.00     3332     6907
## sd(trial)                                          0.06 1.00     1235     2547
## sd(meansTRUE)                                      0.05 1.00     1379     2192
## sd(sd_diff15)                                      0.09 1.00     4032     7833
## sd(lo_ground_truth:trial)                          0.27 1.00     1677     5455
## sd(meansTRUE:sd_diff15)                            0.07 1.00     3520     7180
## sd(sigma_Intercept)                                1.24 1.00     2740     4467
## sd(sigma_lo_ground_truth)                          0.43 1.00     3982     6582
## sd(sigma_trial)                                    1.27 1.00     5467     7404
## cor(Intercept,lo_ground_truth)                    -0.29 1.00      587     1241
## cor(Intercept,trial)                               0.61 1.00     6065     7549
## cor(lo_ground_truth,trial)                         0.26 1.00     4894     5531
## cor(Intercept,meansTRUE)                           0.41 1.00     2402     5259
## cor(lo_ground_truth,meansTRUE)                    -0.30 1.00     2874     5664
## cor(trial,meansTRUE)                               0.63 1.00     1796     3322
## cor(Intercept,sd_diff15)                           0.19 1.00     2076     4506
## cor(lo_ground_truth,sd_diff15)                     0.20 1.00     3829     7001
## cor(trial,sd_diff15)                               0.44 1.00      345      816
## cor(meansTRUE,sd_diff15)                           0.31 1.00      627     1393
## cor(Intercept,lo_ground_truth:trial)              -0.07 1.00     1131     2593
## cor(lo_ground_truth,lo_ground_truth:trial)         0.52 1.00     6278     8237
## cor(trial,lo_ground_truth:trial)                   0.18 1.01      286      427
## cor(meansTRUE,lo_ground_truth:trial)               0.18 1.00      544     1427
## cor(sd_diff15,lo_ground_truth:trial)               0.23 1.00     2636     5672
## cor(Intercept,meansTRUE:sd_diff15)                -0.05 1.00     3043     6243
## cor(lo_ground_truth,meansTRUE:sd_diff15)           0.47 1.00     4341     8138
## cor(trial,meansTRUE:sd_diff15)                     0.57 1.00     1029     2030
## cor(meansTRUE,meansTRUE:sd_diff15)                 0.39 1.00     2070     4493
## cor(sd_diff15,meansTRUE:sd_diff15)                -0.04 1.00     3507     7193
## cor(lo_ground_truth:trial,meansTRUE:sd_diff15)     0.12 1.00     3241     6173
## cor(sigma_Intercept,sigma_lo_ground_truth)        -0.67 1.00     3969     6400
## cor(sigma_Intercept,sigma_trial)                   0.18 1.00     4842     6926
## cor(sigma_lo_ground_truth,sigma_trial)             0.03 1.00     4098     6587
## 
## Population-Level Effects: 
##                                                                        Estimate
## Intercept                                                                 -0.02
## sigma_Intercept                                                           -1.71
## lo_ground_truth                                                            0.45
## meansTRUE                                                                 -0.00
## sd_diff15                                                                  0.04
## conditionHOPs                                                             -0.09
## conditionintervals                                                        -0.01
## conditionQDPs                                                              0.02
## start_meansTRUE                                                            0.01
## trial                                                                     -0.06
## lo_ground_truth:meansTRUE                                                 -0.05
## lo_ground_truth:sd_diff15                                                  0.08
## meansTRUE:sd_diff15                                                        0.02
## lo_ground_truth:conditionHOPs                                             -0.01
## lo_ground_truth:conditionintervals                                        -0.10
## lo_ground_truth:conditionQDPs                                              0.07
## meansTRUE:conditionHOPs                                                    0.08
## meansTRUE:conditionintervals                                               0.01
## meansTRUE:conditionQDPs                                                   -0.02
## sd_diff15:conditionHOPs                                                    0.03
## sd_diff15:conditionintervals                                               0.02
## sd_diff15:conditionQDPs                                                   -0.01
## lo_ground_truth:start_meansTRUE                                           -0.14
## meansTRUE:start_meansTRUE                                                 -0.02
## sd_diff15:start_meansTRUE                                                  0.00
## conditionHOPs:start_meansTRUE                                              0.08
## conditionintervals:start_meansTRUE                                         0.00
## conditionQDPs:start_meansTRUE                                             -0.01
## lo_ground_truth:trial                                                      0.13
## conditionHOPs:trial                                                        0.01
## conditionintervals:trial                                                   0.04
## conditionQDPs:trial                                                        0.05
## lo_ground_truth:meansTRUE:sd_diff15                                        0.05
## lo_ground_truth:meansTRUE:conditionHOPs                                   -0.08
## lo_ground_truth:meansTRUE:conditionintervals                              -0.01
## lo_ground_truth:meansTRUE:conditionQDPs                                   -0.00
## lo_ground_truth:sd_diff15:conditionHOPs                                    0.06
## lo_ground_truth:sd_diff15:conditionintervals                              -0.01
## lo_ground_truth:sd_diff15:conditionQDPs                                    0.03
## meansTRUE:sd_diff15:conditionHOPs                                         -0.00
## meansTRUE:sd_diff15:conditionintervals                                    -0.02
## meansTRUE:sd_diff15:conditionQDPs                                          0.00
## lo_ground_truth:meansTRUE:start_meansTRUE                                  0.04
## lo_ground_truth:sd_diff15:start_meansTRUE                                  0.03
## meansTRUE:sd_diff15:start_meansTRUE                                       -0.01
## lo_ground_truth:conditionHOPs:start_meansTRUE                             -0.07
## lo_ground_truth:conditionintervals:start_meansTRUE                         0.03
## lo_ground_truth:conditionQDPs:start_meansTRUE                              0.14
## meansTRUE:conditionHOPs:start_meansTRUE                                   -0.09
## meansTRUE:conditionintervals:start_meansTRUE                               0.01
## meansTRUE:conditionQDPs:start_meansTRUE                                    0.02
## sd_diff15:conditionHOPs:start_meansTRUE                                   -0.02
## sd_diff15:conditionintervals:start_meansTRUE                              -0.01
## sd_diff15:conditionQDPs:start_meansTRUE                                   -0.02
## lo_ground_truth:conditionHOPs:trial                                       -0.03
## lo_ground_truth:conditionintervals:trial                                   0.00
## lo_ground_truth:conditionQDPs:trial                                       -0.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs                         -0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals                     0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs                         -0.04
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE                        0.04
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE                    0.12
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE               0.03
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE                   -0.01
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE                    0.01
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE               0.00
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE                   -0.02
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                          0.04
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE                     0.02
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                          0.01
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE         -0.07
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE    -0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE         -0.00
## sigma_lo_ground_truth                                                      0.45
## sigma_conditionHOPs                                                        0.58
## sigma_conditionintervals                                                   0.16
## sigma_conditionQDPs                                                       -0.05
## sigma_trial                                                               -0.45
## sigma_meansTRUE                                                            0.00
## sigma_start_meansTRUE                                                     -0.04
## sigma_lo_ground_truth:conditionHOPs                                       -0.17
## sigma_lo_ground_truth:conditionintervals                                  -0.10
## sigma_lo_ground_truth:conditionQDPs                                       -0.03
## sigma_lo_ground_truth:trial                                                0.02
## sigma_conditionHOPs:trial                                                  0.06
## sigma_conditionintervals:trial                                             0.12
## sigma_conditionQDPs:trial                                                 -0.06
## sigma_meansTRUE:start_meansTRUE                                           -0.23
## sigma_lo_ground_truth:conditionHOPs:trial                                  0.05
## sigma_lo_ground_truth:conditionintervals:trial                             0.06
## sigma_lo_ground_truth:conditionQDPs:trial                                 -0.02
##                                                                        Est.Error
## Intercept                                                                   0.02
## sigma_Intercept                                                             0.09
## lo_ground_truth                                                             0.04
## meansTRUE                                                                   0.02
## sd_diff15                                                                   0.02
## conditionHOPs                                                               0.03
## conditionintervals                                                          0.02
## conditionQDPs                                                               0.02
## start_meansTRUE                                                             0.02
## trial                                                                       0.02
## lo_ground_truth:meansTRUE                                                   0.02
## lo_ground_truth:sd_diff15                                                   0.02
## meansTRUE:sd_diff15                                                         0.02
## lo_ground_truth:conditionHOPs                                               0.07
## lo_ground_truth:conditionintervals                                          0.06
## lo_ground_truth:conditionQDPs                                               0.06
## meansTRUE:conditionHOPs                                                     0.03
## meansTRUE:conditionintervals                                                0.02
## meansTRUE:conditionQDPs                                                     0.03
## sd_diff15:conditionHOPs                                                     0.04
## sd_diff15:conditionintervals                                                0.03
## sd_diff15:conditionQDPs                                                     0.03
## lo_ground_truth:start_meansTRUE                                             0.06
## meansTRUE:start_meansTRUE                                                   0.03
## sd_diff15:start_meansTRUE                                                   0.03
## conditionHOPs:start_meansTRUE                                               0.04
## conditionintervals:start_meansTRUE                                          0.03
## conditionQDPs:start_meansTRUE                                               0.03
## lo_ground_truth:trial                                                       0.03
## conditionHOPs:trial                                                         0.04
## conditionintervals:trial                                                    0.03
## conditionQDPs:trial                                                         0.03
## lo_ground_truth:meansTRUE:sd_diff15                                         0.02
## lo_ground_truth:meansTRUE:conditionHOPs                                     0.04
## lo_ground_truth:meansTRUE:conditionintervals                                0.03
## lo_ground_truth:meansTRUE:conditionQDPs                                     0.03
## lo_ground_truth:sd_diff15:conditionHOPs                                     0.03
## lo_ground_truth:sd_diff15:conditionintervals                                0.02
## lo_ground_truth:sd_diff15:conditionQDPs                                     0.03
## meansTRUE:sd_diff15:conditionHOPs                                           0.04
## meansTRUE:sd_diff15:conditionintervals                                      0.03
## meansTRUE:sd_diff15:conditionQDPs                                           0.03
## lo_ground_truth:meansTRUE:start_meansTRUE                                   0.03
## lo_ground_truth:sd_diff15:start_meansTRUE                                   0.02
## meansTRUE:sd_diff15:start_meansTRUE                                         0.03
## lo_ground_truth:conditionHOPs:start_meansTRUE                               0.09
## lo_ground_truth:conditionintervals:start_meansTRUE                          0.09
## lo_ground_truth:conditionQDPs:start_meansTRUE                               0.09
## meansTRUE:conditionHOPs:start_meansTRUE                                     0.05
## meansTRUE:conditionintervals:start_meansTRUE                                0.04
## meansTRUE:conditionQDPs:start_meansTRUE                                     0.04
## sd_diff15:conditionHOPs:start_meansTRUE                                     0.05
## sd_diff15:conditionintervals:start_meansTRUE                                0.04
## sd_diff15:conditionQDPs:start_meansTRUE                                     0.04
## lo_ground_truth:conditionHOPs:trial                                         0.05
## lo_ground_truth:conditionintervals:trial                                    0.04
## lo_ground_truth:conditionQDPs:trial                                         0.05
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs                           0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals                      0.03
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs                           0.03
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE                         0.03
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE                     0.05
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE                0.04
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE                     0.04
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE                     0.04
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE                0.03
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE                     0.03
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                           0.05
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE                      0.04
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                           0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE           0.05
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE      0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE           0.04
## sigma_lo_ground_truth                                                       0.03
## sigma_conditionHOPs                                                         0.12
## sigma_conditionintervals                                                    0.12
## sigma_conditionQDPs                                                         0.12
## sigma_trial                                                                 0.10
## sigma_meansTRUE                                                             0.03
## sigma_start_meansTRUE                                                       0.07
## sigma_lo_ground_truth:conditionHOPs                                         0.05
## sigma_lo_ground_truth:conditionintervals                                    0.05
## sigma_lo_ground_truth:conditionQDPs                                         0.05
## sigma_lo_ground_truth:trial                                                 0.05
## sigma_conditionHOPs:trial                                                   0.14
## sigma_conditionintervals:trial                                              0.14
## sigma_conditionQDPs:trial                                                   0.14
## sigma_meansTRUE:start_meansTRUE                                             0.05
## sigma_lo_ground_truth:conditionHOPs:trial                                   0.07
## sigma_lo_ground_truth:conditionintervals:trial                              0.07
## sigma_lo_ground_truth:conditionQDPs:trial                                   0.07
##                                                                        l-95% CI
## Intercept                                                                 -0.05
## sigma_Intercept                                                           -1.89
## lo_ground_truth                                                            0.36
## meansTRUE                                                                 -0.04
## sd_diff15                                                                  0.00
## conditionHOPs                                                             -0.14
## conditionintervals                                                        -0.05
## conditionQDPs                                                             -0.02
## start_meansTRUE                                                           -0.03
## trial                                                                     -0.10
## lo_ground_truth:meansTRUE                                                 -0.09
## lo_ground_truth:sd_diff15                                                  0.04
## meansTRUE:sd_diff15                                                       -0.03
## lo_ground_truth:conditionHOPs                                             -0.14
## lo_ground_truth:conditionintervals                                        -0.22
## lo_ground_truth:conditionQDPs                                             -0.05
## meansTRUE:conditionHOPs                                                    0.02
## meansTRUE:conditionintervals                                              -0.04
## meansTRUE:conditionQDPs                                                   -0.07
## sd_diff15:conditionHOPs                                                   -0.04
## sd_diff15:conditionintervals                                              -0.04
## sd_diff15:conditionQDPs                                                   -0.07
## lo_ground_truth:start_meansTRUE                                           -0.26
## meansTRUE:start_meansTRUE                                                 -0.07
## sd_diff15:start_meansTRUE                                                 -0.05
## conditionHOPs:start_meansTRUE                                              0.01
## conditionintervals:start_meansTRUE                                        -0.05
## conditionQDPs:start_meansTRUE                                             -0.07
## lo_ground_truth:trial                                                      0.06
## conditionHOPs:trial                                                       -0.06
## conditionintervals:trial                                                  -0.02
## conditionQDPs:trial                                                       -0.01
## lo_ground_truth:meansTRUE:sd_diff15                                        0.00
## lo_ground_truth:meansTRUE:conditionHOPs                                   -0.15
## lo_ground_truth:meansTRUE:conditionintervals                              -0.06
## lo_ground_truth:meansTRUE:conditionQDPs                                   -0.06
## lo_ground_truth:sd_diff15:conditionHOPs                                    0.00
## lo_ground_truth:sd_diff15:conditionintervals                              -0.05
## lo_ground_truth:sd_diff15:conditionQDPs                                   -0.03
## meansTRUE:sd_diff15:conditionHOPs                                         -0.09
## meansTRUE:sd_diff15:conditionintervals                                    -0.08
## meansTRUE:sd_diff15:conditionQDPs                                         -0.06
## lo_ground_truth:meansTRUE:start_meansTRUE                                 -0.02
## lo_ground_truth:sd_diff15:start_meansTRUE                                 -0.02
## meansTRUE:sd_diff15:start_meansTRUE                                       -0.07
## lo_ground_truth:conditionHOPs:start_meansTRUE                             -0.25
## lo_ground_truth:conditionintervals:start_meansTRUE                        -0.14
## lo_ground_truth:conditionQDPs:start_meansTRUE                             -0.04
## meansTRUE:conditionHOPs:start_meansTRUE                                   -0.19
## meansTRUE:conditionintervals:start_meansTRUE                              -0.06
## meansTRUE:conditionQDPs:start_meansTRUE                                   -0.05
## sd_diff15:conditionHOPs:start_meansTRUE                                   -0.11
## sd_diff15:conditionintervals:start_meansTRUE                              -0.08
## sd_diff15:conditionQDPs:start_meansTRUE                                   -0.09
## lo_ground_truth:conditionHOPs:trial                                       -0.13
## lo_ground_truth:conditionintervals:trial                                  -0.09
## lo_ground_truth:conditionQDPs:trial                                       -0.09
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs                         -0.10
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals                    -0.04
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs                         -0.10
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE                       -0.02
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE                    0.02
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE              -0.05
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE                   -0.09
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE                   -0.06
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE              -0.05
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE                   -0.08
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                         -0.06
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE                    -0.05
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                         -0.07
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE         -0.16
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE    -0.11
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE         -0.08
## sigma_lo_ground_truth                                                      0.39
## sigma_conditionHOPs                                                        0.36
## sigma_conditionintervals                                                  -0.07
## sigma_conditionQDPs                                                       -0.29
## sigma_trial                                                               -0.65
## sigma_meansTRUE                                                           -0.06
## sigma_start_meansTRUE                                                     -0.18
## sigma_lo_ground_truth:conditionHOPs                                       -0.27
## sigma_lo_ground_truth:conditionintervals                                  -0.19
## sigma_lo_ground_truth:conditionQDPs                                       -0.12
## sigma_lo_ground_truth:trial                                               -0.08
## sigma_conditionHOPs:trial                                                 -0.22
## sigma_conditionintervals:trial                                            -0.15
## sigma_conditionQDPs:trial                                                 -0.33
## sigma_meansTRUE:start_meansTRUE                                           -0.32
## sigma_lo_ground_truth:conditionHOPs:trial                                 -0.09
## sigma_lo_ground_truth:conditionintervals:trial                            -0.07
## sigma_lo_ground_truth:conditionQDPs:trial                                 -0.15
##                                                                        u-95% CI
## Intercept                                                                  0.01
## sigma_Intercept                                                           -1.53
## lo_ground_truth                                                            0.54
## meansTRUE                                                                  0.03
## sd_diff15                                                                  0.08
## conditionHOPs                                                             -0.03
## conditionintervals                                                         0.03
## conditionQDPs                                                              0.06
## start_meansTRUE                                                            0.06
## trial                                                                     -0.01
## lo_ground_truth:meansTRUE                                                 -0.00
## lo_ground_truth:sd_diff15                                                  0.11
## meansTRUE:sd_diff15                                                        0.06
## lo_ground_truth:conditionHOPs                                              0.12
## lo_ground_truth:conditionintervals                                         0.03
## lo_ground_truth:conditionQDPs                                              0.19
## meansTRUE:conditionHOPs                                                    0.15
## meansTRUE:conditionintervals                                               0.06
## meansTRUE:conditionQDPs                                                    0.03
## sd_diff15:conditionHOPs                                                    0.10
## sd_diff15:conditionintervals                                               0.07
## sd_diff15:conditionQDPs                                                    0.05
## lo_ground_truth:start_meansTRUE                                           -0.02
## meansTRUE:start_meansTRUE                                                  0.04
## sd_diff15:start_meansTRUE                                                  0.05
## conditionHOPs:start_meansTRUE                                              0.16
## conditionintervals:start_meansTRUE                                         0.06
## conditionQDPs:start_meansTRUE                                              0.04
## lo_ground_truth:trial                                                      0.19
## conditionHOPs:trial                                                        0.09
## conditionintervals:trial                                                   0.10
## conditionQDPs:trial                                                        0.11
## lo_ground_truth:meansTRUE:sd_diff15                                        0.10
## lo_ground_truth:meansTRUE:conditionHOPs                                   -0.01
## lo_ground_truth:meansTRUE:conditionintervals                               0.04
## lo_ground_truth:meansTRUE:conditionQDPs                                    0.05
## lo_ground_truth:sd_diff15:conditionHOPs                                    0.12
## lo_ground_truth:sd_diff15:conditionintervals                               0.04
## lo_ground_truth:sd_diff15:conditionQDPs                                    0.08
## meansTRUE:sd_diff15:conditionHOPs                                          0.08
## meansTRUE:sd_diff15:conditionintervals                                     0.04
## meansTRUE:sd_diff15:conditionQDPs                                          0.07
## lo_ground_truth:meansTRUE:start_meansTRUE                                  0.10
## lo_ground_truth:sd_diff15:start_meansTRUE                                  0.07
## meansTRUE:sd_diff15:start_meansTRUE                                        0.04
## lo_ground_truth:conditionHOPs:start_meansTRUE                              0.11
## lo_ground_truth:conditionintervals:start_meansTRUE                         0.20
## lo_ground_truth:conditionQDPs:start_meansTRUE                              0.31
## meansTRUE:conditionHOPs:start_meansTRUE                                    0.02
## meansTRUE:conditionintervals:start_meansTRUE                               0.09
## meansTRUE:conditionQDPs:start_meansTRUE                                    0.10
## sd_diff15:conditionHOPs:start_meansTRUE                                    0.07
## sd_diff15:conditionintervals:start_meansTRUE                               0.06
## sd_diff15:conditionQDPs:start_meansTRUE                                    0.05
## lo_ground_truth:conditionHOPs:trial                                        0.08
## lo_ground_truth:conditionintervals:trial                                   0.09
## lo_ground_truth:conditionQDPs:trial                                        0.09
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs                          0.05
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals                     0.08
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs                          0.03
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE                        0.09
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE                    0.21
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE               0.10
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE                    0.07
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE                    0.08
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE               0.06
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE                    0.04
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                          0.15
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE                     0.10
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                          0.09
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE          0.02
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE     0.03
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE          0.07
## sigma_lo_ground_truth                                                      0.52
## sigma_conditionHOPs                                                        0.81
## sigma_conditionintervals                                                   0.40
## sigma_conditionQDPs                                                        0.18
## sigma_trial                                                               -0.25
## sigma_meansTRUE                                                            0.06
## sigma_start_meansTRUE                                                      0.10
## sigma_lo_ground_truth:conditionHOPs                                       -0.08
## sigma_lo_ground_truth:conditionintervals                                  -0.01
## sigma_lo_ground_truth:conditionQDPs                                        0.07
## sigma_lo_ground_truth:trial                                                0.11
## sigma_conditionHOPs:trial                                                  0.34
## sigma_conditionintervals:trial                                             0.39
## sigma_conditionQDPs:trial                                                  0.22
## sigma_meansTRUE:start_meansTRUE                                           -0.13
## sigma_lo_ground_truth:conditionHOPs:trial                                  0.18
## sigma_lo_ground_truth:conditionintervals:trial                             0.20
## sigma_lo_ground_truth:conditionQDPs:trial                                  0.12
##                                                                        Rhat
## Intercept                                                              1.00
## sigma_Intercept                                                        1.00
## lo_ground_truth                                                        1.00
## meansTRUE                                                              1.00
## sd_diff15                                                              1.00
## conditionHOPs                                                          1.00
## conditionintervals                                                     1.00
## conditionQDPs                                                          1.00
## start_meansTRUE                                                        1.00
## trial                                                                  1.00
## lo_ground_truth:meansTRUE                                              1.00
## lo_ground_truth:sd_diff15                                              1.00
## meansTRUE:sd_diff15                                                    1.00
## lo_ground_truth:conditionHOPs                                          1.00
## lo_ground_truth:conditionintervals                                     1.00
## lo_ground_truth:conditionQDPs                                          1.00
## meansTRUE:conditionHOPs                                                1.00
## meansTRUE:conditionintervals                                           1.00
## meansTRUE:conditionQDPs                                                1.00
## sd_diff15:conditionHOPs                                                1.00
## sd_diff15:conditionintervals                                           1.00
## sd_diff15:conditionQDPs                                                1.00
## lo_ground_truth:start_meansTRUE                                        1.00
## meansTRUE:start_meansTRUE                                              1.00
## sd_diff15:start_meansTRUE                                              1.00
## conditionHOPs:start_meansTRUE                                          1.00
## conditionintervals:start_meansTRUE                                     1.00
## conditionQDPs:start_meansTRUE                                          1.00
## lo_ground_truth:trial                                                  1.00
## conditionHOPs:trial                                                    1.00
## conditionintervals:trial                                               1.00
## conditionQDPs:trial                                                    1.00
## lo_ground_truth:meansTRUE:sd_diff15                                    1.00
## lo_ground_truth:meansTRUE:conditionHOPs                                1.00
## lo_ground_truth:meansTRUE:conditionintervals                           1.00
## lo_ground_truth:meansTRUE:conditionQDPs                                1.00
## lo_ground_truth:sd_diff15:conditionHOPs                                1.00
## lo_ground_truth:sd_diff15:conditionintervals                           1.00
## lo_ground_truth:sd_diff15:conditionQDPs                                1.00
## meansTRUE:sd_diff15:conditionHOPs                                      1.00
## meansTRUE:sd_diff15:conditionintervals                                 1.00
## meansTRUE:sd_diff15:conditionQDPs                                      1.00
## lo_ground_truth:meansTRUE:start_meansTRUE                              1.00
## lo_ground_truth:sd_diff15:start_meansTRUE                              1.00
## meansTRUE:sd_diff15:start_meansTRUE                                    1.00
## lo_ground_truth:conditionHOPs:start_meansTRUE                          1.00
## lo_ground_truth:conditionintervals:start_meansTRUE                     1.00
## lo_ground_truth:conditionQDPs:start_meansTRUE                          1.00
## meansTRUE:conditionHOPs:start_meansTRUE                                1.00
## meansTRUE:conditionintervals:start_meansTRUE                           1.00
## meansTRUE:conditionQDPs:start_meansTRUE                                1.00
## sd_diff15:conditionHOPs:start_meansTRUE                                1.00
## sd_diff15:conditionintervals:start_meansTRUE                           1.00
## sd_diff15:conditionQDPs:start_meansTRUE                                1.00
## lo_ground_truth:conditionHOPs:trial                                    1.00
## lo_ground_truth:conditionintervals:trial                               1.00
## lo_ground_truth:conditionQDPs:trial                                    1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs                      1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals                 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs                      1.00
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE                    1.00
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE                1.00
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE           1.00
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE                1.00
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE                1.00
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE           1.00
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE                1.00
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                      1.00
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE                 1.00
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                      1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE      1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE 1.00
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE      1.00
## sigma_lo_ground_truth                                                  1.00
## sigma_conditionHOPs                                                    1.00
## sigma_conditionintervals                                               1.00
## sigma_conditionQDPs                                                    1.00
## sigma_trial                                                            1.00
## sigma_meansTRUE                                                        1.00
## sigma_start_meansTRUE                                                  1.00
## sigma_lo_ground_truth:conditionHOPs                                    1.00
## sigma_lo_ground_truth:conditionintervals                               1.00
## sigma_lo_ground_truth:conditionQDPs                                    1.00
## sigma_lo_ground_truth:trial                                            1.00
## sigma_conditionHOPs:trial                                              1.00
## sigma_conditionintervals:trial                                         1.00
## sigma_conditionQDPs:trial                                              1.00
## sigma_meansTRUE:start_meansTRUE                                        1.00
## sigma_lo_ground_truth:conditionHOPs:trial                              1.00
## sigma_lo_ground_truth:conditionintervals:trial                         1.00
## sigma_lo_ground_truth:conditionQDPs:trial                              1.00
##                                                                        Bulk_ESS
## Intercept                                                                  2580
## sigma_Intercept                                                            1655
## lo_ground_truth                                                            3504
## meansTRUE                                                                  2453
## sd_diff15                                                                  2639
## conditionHOPs                                                              3514
## conditionintervals                                                         2964
## conditionQDPs                                                              2871
## start_meansTRUE                                                            2462
## trial                                                                      3463
## lo_ground_truth:meansTRUE                                                  2696
## lo_ground_truth:sd_diff15                                                  2505
## meansTRUE:sd_diff15                                                        2658
## lo_ground_truth:conditionHOPs                                              4181
## lo_ground_truth:conditionintervals                                         3571
## lo_ground_truth:conditionQDPs                                              3670
## meansTRUE:conditionHOPs                                                    3465
## meansTRUE:conditionintervals                                               2593
## meansTRUE:conditionQDPs                                                    2767
## sd_diff15:conditionHOPs                                                    3785
## sd_diff15:conditionintervals                                               3151
## sd_diff15:conditionQDPs                                                    3202
## lo_ground_truth:start_meansTRUE                                            3467
## meansTRUE:start_meansTRUE                                                  2400
## sd_diff15:start_meansTRUE                                                  2584
## conditionHOPs:start_meansTRUE                                              3524
## conditionintervals:start_meansTRUE                                         2934
## conditionQDPs:start_meansTRUE                                              2567
## lo_ground_truth:trial                                                      4230
## conditionHOPs:trial                                                        4865
## conditionintervals:trial                                                   4120
## conditionQDPs:trial                                                        3855
## lo_ground_truth:meansTRUE:sd_diff15                                        2540
## lo_ground_truth:meansTRUE:conditionHOPs                                    3416
## lo_ground_truth:meansTRUE:conditionintervals                               2894
## lo_ground_truth:meansTRUE:conditionQDPs                                    2967
## lo_ground_truth:sd_diff15:conditionHOPs                                    3141
## lo_ground_truth:sd_diff15:conditionintervals                               2823
## lo_ground_truth:sd_diff15:conditionQDPs                                    2842
## meansTRUE:sd_diff15:conditionHOPs                                          3533
## meansTRUE:sd_diff15:conditionintervals                                     3032
## meansTRUE:sd_diff15:conditionQDPs                                          3287
## lo_ground_truth:meansTRUE:start_meansTRUE                                  2635
## lo_ground_truth:sd_diff15:start_meansTRUE                                  2445
## meansTRUE:sd_diff15:start_meansTRUE                                        2625
## lo_ground_truth:conditionHOPs:start_meansTRUE                              4090
## lo_ground_truth:conditionintervals:start_meansTRUE                         3643
## lo_ground_truth:conditionQDPs:start_meansTRUE                              3720
## meansTRUE:conditionHOPs:start_meansTRUE                                    3535
## meansTRUE:conditionintervals:start_meansTRUE                               2619
## meansTRUE:conditionQDPs:start_meansTRUE                                    2646
## sd_diff15:conditionHOPs:start_meansTRUE                                    3806
## sd_diff15:conditionintervals:start_meansTRUE                               3192
## sd_diff15:conditionQDPs:start_meansTRUE                                    3112
## lo_ground_truth:conditionHOPs:trial                                        5206
## lo_ground_truth:conditionintervals:trial                                   4917
## lo_ground_truth:conditionQDPs:trial                                        5060
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs                          3055
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals                     2877
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs                          2801
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE                        2424
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE                    3423
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE               2877
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE                    2890
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE                    3444
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE               2946
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE                    2849
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                          3464
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE                     3157
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                          3243
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE          3116
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE     2893
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE          2780
## sigma_lo_ground_truth                                                      2449
## sigma_conditionHOPs                                                        1577
## sigma_conditionintervals                                                   1646
## sigma_conditionQDPs                                                        1924
## sigma_trial                                                                5088
## sigma_meansTRUE                                                            8196
## sigma_start_meansTRUE                                                      2687
## sigma_lo_ground_truth:conditionHOPs                                        2485
## sigma_lo_ground_truth:conditionintervals                                   2546
## sigma_lo_ground_truth:conditionQDPs                                        2642
## sigma_lo_ground_truth:trial                                                7275
## sigma_conditionHOPs:trial                                                  5451
## sigma_conditionintervals:trial                                             4927
## sigma_conditionQDPs:trial                                                  5451
## sigma_meansTRUE:start_meansTRUE                                            8703
## sigma_lo_ground_truth:conditionHOPs:trial                                  7402
## sigma_lo_ground_truth:conditionintervals:trial                             7054
## sigma_lo_ground_truth:conditionQDPs:trial                                  7836
##                                                                        Tail_ESS
## Intercept                                                                  4762
## sigma_Intercept                                                            3213
## lo_ground_truth                                                            5086
## meansTRUE                                                                  4637
## sd_diff15                                                                  5257
## conditionHOPs                                                              5745
## conditionintervals                                                         5420
## conditionQDPs                                                              4919
## start_meansTRUE                                                            4345
## trial                                                                      5812
## lo_ground_truth:meansTRUE                                                  5245
## lo_ground_truth:sd_diff15                                                  5313
## meansTRUE:sd_diff15                                                        5779
## lo_ground_truth:conditionHOPs                                              6754
## lo_ground_truth:conditionintervals                                         5562
## lo_ground_truth:conditionQDPs                                              6501
## meansTRUE:conditionHOPs                                                    6165
## meansTRUE:conditionintervals                                               4680
## meansTRUE:conditionQDPs                                                    5192
## sd_diff15:conditionHOPs                                                    6343
## sd_diff15:conditionintervals                                               5559
## sd_diff15:conditionQDPs                                                    6196
## lo_ground_truth:start_meansTRUE                                            5426
## meansTRUE:start_meansTRUE                                                  4675
## sd_diff15:start_meansTRUE                                                  5065
## conditionHOPs:start_meansTRUE                                              6332
## conditionintervals:start_meansTRUE                                         4873
## conditionQDPs:start_meansTRUE                                              4070
## lo_ground_truth:trial                                                      6496
## conditionHOPs:trial                                                        7227
## conditionintervals:trial                                                   6041
## conditionQDPs:trial                                                        6534
## lo_ground_truth:meansTRUE:sd_diff15                                        4524
## lo_ground_truth:meansTRUE:conditionHOPs                                    6114
## lo_ground_truth:meansTRUE:conditionintervals                               5694
## lo_ground_truth:meansTRUE:conditionQDPs                                    5268
## lo_ground_truth:sd_diff15:conditionHOPs                                    6623
## lo_ground_truth:sd_diff15:conditionintervals                               5344
## lo_ground_truth:sd_diff15:conditionQDPs                                    5599
## meansTRUE:sd_diff15:conditionHOPs                                          6233
## meansTRUE:sd_diff15:conditionintervals                                     5547
## meansTRUE:sd_diff15:conditionQDPs                                          5799
## lo_ground_truth:meansTRUE:start_meansTRUE                                  4775
## lo_ground_truth:sd_diff15:start_meansTRUE                                  5190
## meansTRUE:sd_diff15:start_meansTRUE                                        5272
## lo_ground_truth:conditionHOPs:start_meansTRUE                              6134
## lo_ground_truth:conditionintervals:start_meansTRUE                         6001
## lo_ground_truth:conditionQDPs:start_meansTRUE                              6013
## meansTRUE:conditionHOPs:start_meansTRUE                                    5655
## meansTRUE:conditionintervals:start_meansTRUE                               4972
## meansTRUE:conditionQDPs:start_meansTRUE                                    5144
## sd_diff15:conditionHOPs:start_meansTRUE                                    7108
## sd_diff15:conditionintervals:start_meansTRUE                               5579
## sd_diff15:conditionQDPs:start_meansTRUE                                    5919
## lo_ground_truth:conditionHOPs:trial                                        7002
## lo_ground_truth:conditionintervals:trial                                   7467
## lo_ground_truth:conditionQDPs:trial                                        7159
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs                          6430
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals                     5071
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs                          4493
## lo_ground_truth:meansTRUE:sd_diff15:start_meansTRUE                        4851
## lo_ground_truth:meansTRUE:conditionHOPs:start_meansTRUE                    6432
## lo_ground_truth:meansTRUE:conditionintervals:start_meansTRUE               5980
## lo_ground_truth:meansTRUE:conditionQDPs:start_meansTRUE                    5374
## lo_ground_truth:sd_diff15:conditionHOPs:start_meansTRUE                    6616
## lo_ground_truth:sd_diff15:conditionintervals:start_meansTRUE               5255
## lo_ground_truth:sd_diff15:conditionQDPs:start_meansTRUE                    5729
## meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE                          6080
## meansTRUE:sd_diff15:conditionintervals:start_meansTRUE                     5534
## meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE                          5861
## lo_ground_truth:meansTRUE:sd_diff15:conditionHOPs:start_meansTRUE          6537
## lo_ground_truth:meansTRUE:sd_diff15:conditionintervals:start_meansTRUE     5046
## lo_ground_truth:meansTRUE:sd_diff15:conditionQDPs:start_meansTRUE          4737
## sigma_lo_ground_truth                                                      4855
## sigma_conditionHOPs                                                        3284
## sigma_conditionintervals                                                   3342
## sigma_conditionQDPs                                                        3827
## sigma_trial                                                                7328
## sigma_meansTRUE                                                            8609
## sigma_start_meansTRUE                                                      4777
## sigma_lo_ground_truth:conditionHOPs                                        4456
## sigma_lo_ground_truth:conditionintervals                                   5287
## sigma_lo_ground_truth:conditionQDPs                                        4578
## sigma_lo_ground_truth:trial                                                8503
## sigma_conditionHOPs:trial                                                  7735
## sigma_conditionintervals:trial                                             7564
## sigma_conditionQDPs:trial                                                  7428
## sigma_meansTRUE:start_meansTRUE                                            8607
## sigma_lo_ground_truth:conditionHOPs:trial                                  8543
## sigma_lo_ground_truth:conditionintervals:trial                             8990
## sigma_lo_ground_truth:conditionQDPs:trial                                  8508
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Model Comparison

Let’s see how this maximal model compares with our previous model.

waic(m.m.llo.r_means.sd.trial.sigma_gt.trial, m.max)
## Output of model 'm.m.llo.r_means.sd.trial.sigma_gt.trial':
## 
## Computed from 10000 by 19924 log-likelihood matrix
## 
##           Estimate    SE
## elpd_waic  -7806.6 206.2
## p_waic      3419.6  48.1
## waic       15613.1 412.3
## 
## 1726 (8.7%) p_waic estimates greater than 0.4. We recommend trying loo instead. 
## 
## Output of model 'm.max':
## 
## Computed from 10000 by 19924 log-likelihood matrix
## 
##           Estimate    SE
## elpd_waic  -7761.2 206.2
## p_waic      3445.7  48.3
## waic       15522.4 412.3
## 
## 1752 (8.8%) p_waic estimates greater than 0.4. We recommend trying loo instead. 
## 
## Model comparisons:
##                                         elpd_diff se_diff
## m.max                                     0.0       0.0  
## m.m.llo.r_means.sd.trial.sigma_gt.trial -45.4      12.3

It looks like adding predictors for block order improves fit somewhat, so we’ll run with the maximal version of the model that we managed to fit.

Predictive Checks

Let’s check our posterior predictive distribution.

# posterior predictive check
model_df %>%
  select(lo_ground_truth, worker_id, means, sd_diff, condition, trial, start_means) %>%
  add_predicted_draws(m.max, prediction = "lo_p_sup", seed = 1234, n = 500) %>%
  mutate(
    # transform to probability units
    post_p_sup = plogis(lo_p_sup)
  ) %>%
  ggplot(aes(x = post_p_sup)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior predictive distribution for probability of superiority") +
  theme(panel.grid = element_blank())

How do these predictions compare to the observed data?

# data density
model_df %>%
  ggplot(aes(x = p_superiority)) +
  geom_density(fill = "black", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Data distribution for probability of superiority") +
  theme(panel.grid = element_blank())

Let’s take a look at predictions per worker and visualization condition to get a more granular sense of our model fit.

model_check_df %>%
  group_by(lo_ground_truth, worker_id, means, sd_diff, condition, trial, start_means) %>%
  add_predicted_draws(m.max, n = 500) %>%
  ggplot(aes(x = lo_ground_truth, y = lo_p_sup, color = condition, fill = condition)) +
  geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
  stat_lineribbon(aes(y = .prediction), .width = c(.95, .80, .50), alpha = .25) +
  geom_point(data = model_check_df) +
  scale_fill_brewer(type = "qual", palette = 2) +
  scale_color_brewer(type = "qual", palette = 2) + 
  coord_cartesian(xlim = quantile(model_df$lo_ground_truth, c(0, 1)),
                  ylim = quantile(model_df$lo_p_sup, c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_wrap(~ worker_id)

What does this look like in probability units?

model_check_df %>%
  group_by(lo_ground_truth, worker_id, means, sd_diff, condition, trial, start_means) %>%
  add_predicted_draws(m.max, n = 500) %>%
  ggplot(aes(x = plogis(lo_ground_truth), y = plogis(lo_p_sup), color = condition, fill = condition)) +
  geom_abline(intercept = 0, slope = 1, size = 1, alpha = .3, color = "red", linetype = "dashed") + # ground truth
  stat_lineribbon(aes(y = plogis(.prediction)), .width = c(.95, .80, .50), alpha = .25) +
  geom_point(data = model_check_df) +
  scale_fill_brewer(type = "qual", palette = 2) +
  scale_color_brewer(type = "qual", palette = 2) + 
  coord_cartesian(xlim = quantile(plogis(model_df$lo_ground_truth), c(0, 1)),
                  ylim = quantile(plogis(model_df$lo_p_sup), c(0, 1))) +
  theme_bw() +
  theme(panel.grid = element_blank()) + 
  facet_wrap(~ worker_id)

Order Effects

What does the posterior for the slope look like when means are present vs absent? We’ll split this based on uncertainty shown and block order (marginalizing across visualization conditions) to see if there is a difference in the effect of extrinsic means per block.

model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%          # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.max, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%  # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(means, sd_diff, start_means, .draw) %>%  # group by predictors to keep
  summarise(slope = weighted.mean(slope)) %>%       # marginalize out visualization condition by taking a weighted average
  ggplot(aes(x = slope, group = means, color = means, fill = means)) +
  geom_density(alpha = 0.35) +
  scale_x_continuous(expression(slope), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior for slopes for mean present/absent") +
  theme(panel.grid = element_blank()) +
  facet_grid(start_means ~ sd_diff)

This effect suggests that adding means is most harmful at low uncertainty when users start with them, and adding means is helpful at high uncertainty in the second block of trials. This is a strange order effect, and it may be burying the signal for the

What does the posterior for the slope in each visualization condition look like, marginalizing across other predictors? Again, we’ll facet by block order to see if this has any impact on our results.

model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%          # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.max, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%  # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(condition, start_means, .draw) %>%       # group by predictors to keep
  summarise(slope = weighted.mean(slope)) %>%       # marginalize out means present/absent by taking a weighted average
  ggplot(aes(x = slope, group = condition, color = condition, fill = condition)) +
  geom_density(alpha = 0.35) +
  scale_fill_brewer(type = "qual", palette = 2) +
  scale_color_brewer(type = "qual", palette = 2) +
  scale_x_continuous(expression(slope), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior for slopes by visualization condition") +
  theme(panel.grid = element_blank()) +
  facet_grid(start_means ~ .)

It looks like LLO slopes are smaller (more biased) when users start the task with extrinsic means, except for with quantile dotplots.

What if we break these marginal effects down into simple effects for the interaction of the presence/absence of the mean, uncertainty shown, block order, and visualization condition?

model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%                      # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.max, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%              # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(means, sd_diff, condition, start_means, .draw) %>%   # group by predictors to keep
  summarise(slope = weighted.mean(slope)) %>%                   # marginalize out means present/absent by taking a weighted average
  ggplot(aes(x = slope, y = condition, group = means, fill = means)) +
  stat_slabh(alpha = 0.35) + 
  labs(subtitle = "Posterior for slopes for means * sd * block order * visualization condition") +
  theme_minimal() +
  facet_grid(start_means ~ sd_diff)

It looks like when participants start the task with extrinsic means, their LLO slopes become less biased when those means are removed, especially when uncertainty is low. Whereas when participants start the task without means, LLO slopes become less biased when means are added only for intervals and densities at high levels of uncertainty. For HOPs on the other hand, adding extrinsic means in the second block makes slopes more biased (despite the fact the users have more practice with HOPs by the second block).

Main Findings Adjusting for Order Effects

What is the effect of extrinsic means at high and low undertainty in our four visualization condition after adjusting for order effects?

model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%                      # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.max, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%              # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(means, sd_diff, condition, .draw) %>%   # group by predictors to keep
  summarise(slope = weighted.mean(slope)) %>%      # marginalize out other predictors by taking a weighted average
  ggplot(aes(x = slope, y = condition, group = means, fill = means)) +
  stat_slabh(alpha = 0.35) +
  labs(
    title = "Posterior Slopes in Linear Log Odds Model",
    x = "Slope",
    y = "Visualization",
    fill = "Means Present"
  ) +
  theme_minimal() +
  # theme(panel.grid.minor = element_blank()) +
  facet_grid(. ~ sd_diff)

model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%                      # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.max, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%              # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(means, sd_diff, condition, .draw) %>%   # group by predictors to keep
  summarise(slope = weighted.mean(slope)) %>%      # marginalize out other predictors by taking a weighted average
  compare_levels(slope, by = means) %>%            # contrast mean present - absent
  ggplot(aes(x = slope, y = condition)) +
  stat_slabh(alpha = 0.35) +
  labs(
    title = "Effect of Means on LLO Slopes",
    x = "Slope Difference (Means present - absent)",
    y = "Visualization"
  ) +
  theme_minimal() +
  # theme(panel.grid.minor = element_blank()) +
  facet_grid(. ~ sd_diff)

model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%                      # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.max, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%              # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(means, sd_diff, condition, .draw) %>%   # group by predictors to keep
  summarise(slope = weighted.mean(slope)) %>%      # marginalize out other predictors by taking a weighted average
  compare_levels(slope, by = means) %>%            # contrast mean present - absent
  compare_levels(slope, by = sd_diff) %>%            # contrast sd_diff high - low (I think)
  ggplot(aes(x = slope, y = condition)) +
  stat_slabh(alpha = 0.35) +
  labs(
    title = "Posterior Slopes in Linear Log Odds Model",
    x = "Slope Difference (Effect of means at high - low uncertainty)",
    y = "Visualization"
  ) +
  theme_minimal()

  # theme(panel.grid.minor = element_blank())

It looks like extrinsic means lead to greater underestimation of probability of superiority (lower LLO slopes) when uncertainty is low, regardless of visualization condition. This is the effect we expected to see but which eluded us until we controlled for order effects. Surprisingly, the impact of extrinsic means does not seem to depend on the intinsic salience of the mean in the uncertainty visualization conditions. At high levels of uncertainty, extrinsic means improve slopes for intervals and densities but still reduce slopes for HOPs. These results suggest that adding extrinsic means is not a good design choice for HOPs or when the distributions visualized on a common axis differ in their variance.

What about the slopes in each visualization condition after adjusting for order effects?

model_df %>%
  group_by(means, sd_diff, condition, trial, start_means) %>%
  data_grid(lo_ground_truth = c(0, 1)) %>%          # get fitted draws (in log odds units) only for ground truth of 0 and 1
  add_fitted_draws(m.max, re_formula = NA) %>%
  compare_levels(.value, by = lo_ground_truth) %>%  # calculate the difference between fits at 1 and 0 (i.e., slope)
  rename(slope = .value) %>%
  group_by(condition, .draw) %>%       # group by predictors to keep
  summarise(slope = weighted.mean(slope)) %>%       # marginalize out means present/absent by taking a weighted average
  ggplot(aes(x = slope, group = condition, color = condition, fill = condition)) +
  geom_density(alpha = 0.35) +
  scale_fill_brewer(type = "qual", palette = 2) +
  scale_color_brewer(type = "qual", palette = 2) +
  scale_x_continuous(expression(slope), expand = c(0, 0)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(subtitle = "Posterior for slopes by visualization condition") +
  theme(panel.grid = element_blank())